Features
Define templated LLM interactions as microservices
You can define named prompts in Chalk to help maintain consistency in your LLM workflows by making prompts more maintainable and reusable across your application. You can parameterize prompts using Jinja templating to inject feature values.
Chalk supports two primary methods for defining and managing prompts: direct prompt definitions and named prompts. Each approach offers different benefits depending on your use case and deployment requirements.
Direct prompt definitions provide immediate, inline configuration of LLM completions.
This approach is useful for rapid prototyping, one-off implementations, or when prompt logic is tightly coupled to specific features.
To define a direct prompt to perform sentiment analysis on a comment, you could define the following Chalk features utilizing the chalk.prompts
library.
import chalk.functions as F
import chalk.prompts as P
from chalk.features import features
from pydantic import BaseModel
system_message = """You are a sentiment analysis expert. Analyze the user's comment and determine if it's positive, negative, or neutral.
Rules:
- Be objective and focus on the text's emotional tone
- Consider context and nuance
- Provide a confidence score based on clarity of sentiment"""
user_message = "Analyze this comment: {{User.comment}}"
class UserSentiment(BaseModel):
sentiment: str = Field(description="positive, negative, or neutral")
confidence: float = Field(description="on a 1-100 scale")
@features
class User:
id: str
comment: str
actual_sentiment: str
response: P.PromptResponse = P.completion(
model="gpt-4o",
messages=[
P.message(role="system", content=system_message),
P.message(role="user", content=F.jinja(user_message)),
],
output_structure=UserSentiment,
)
predicted_sentiment: str = F.json_value(_.response.response, "sentiment")
In the example above, we see how direct prompts integrate with Chalk’s feature system.
The prompt uses Jinja templating
to dynamically inject the user’s comment into the prompt through {{User.comment}}
, allowing the LLM to analyze the
specific content at runtime.
The UserSentiment
Pydantic model defines a structured output format, ensuring consistent response parsing across all completions.
While this approach provides fine-grained control over the prompt configuration, it tightly couples the prompt template, model settings, and output structure to your deployment code. Any changes to the prompt template or model parameters require a new deployment.
Chalk’s chat completion API extends beyond text-only interactions by supporting multimodal inputs, enabling you to process images alongside text in your LLM workflows. This capability opens up new possibilities for visual content analysis, allowing you to pass images directly to vision-capable models like GPT-4o, combine text and image inputs in a single prompt, and extract structured information from visual content. Here’s how you can process a receipt image to extract structured data:
@features
class Receipt:
image_url: str
image_response: P.MultimodalPromptResponse = P.completion(
model="gpt-4o",
messages=[
P.message(
"user",
[
{"type": "input_text", "text": "Extract all items, prices, and total from this receipt"},
{"type": "input_image", "image_url": _.image_url},
],
),
],
)
This multimodal approach seamlessly integrates with Chalk’s feature system, treating visual inputs as first-class citizens alongside text. The same structured output capabilities available for text-only prompts apply here, allowing you to define Pydantic models for consistent parsing of visual information extraction tasks.
Named prompts abstract prompt definitions into standalone, reusable components that can be managed independently of your feature code. This separation of concerns enables centralized prompt management across your application and provides robust version control of your prompt templates. Named prompts allow you to conduct A/B testing between different prompt variants and make runtime modifications through the web interface without code changes. You can define a named prompt in the Web UI and implement the same Chalk feature with more concise code.
import chalk.functions as F
import chalk.prompts as P
from chalk.features import features
@features
class User:
id: str
comment: str
actual_sentiment: str
response: P.PromptResponse = P.run_prompt("analyze_sentiment")
predicted_sentiment: str = F.json_value(_.response.response, "sentiment")
Named prompts function as microservices within your ML pipeline, allowing you to modify prompt behavior without requiring code deployments. This architecture is particularly valuable for teams that need to iterate rapidly on prompt engineering or maintain multiple prompt variants.
The Chalk evaluation platform provides comprehensive tooling for testing and comparing prompt performance. This framework enables data-driven prompt engineering through systematic testing of prompt variants.
Evaluation begins with creating a labeled dataset that serves as your ground truth. This dataset should contain input examples and their expected outputs, enabling automated assessment of prompt performance:
import pandas as pd
sentiment_df = pd.read_csv("sentiment.csv")
sentiment_ds = chalk_client.create_dataset(
input=sentiment_df,
dataset_name="sentiment_gt",
)
sentiment_ds.to_pandas()
The evaluation framework allows you to test variations in prompt templates, assess the impact of different models and parameters, and validate changes to output structures. Built-in evaluators like “exact_match” offer immediate insights into prompt performance, while custom evaluation expressions enable more nuanced analysis specific to your use case.
eval_run = chalk_client.prompt_evaluation(
dataset_name="sentiment_gt",
evaluators=["exact_match"],
reference_output="user.actual_sentiment",
prompts=[
"analyze_sentiment-v1",
"analyze_sentiment-v2",
P.completion(
model="gpt-4o",
messages=[
P.message(role="system", content=system_message),
P.message(role="user", content=user_message),
],
),
]
)
eval_run.to_pandas()
The results of every evaluation are stored and available for viewing in the Web UI.