Dynamic Chalk Queries

Overview

The usual way to give an LLM access to your data is to gather it up front: run a fixed query, serialize the rows into the prompt, and hope you fetched the right columns. That is a guess made before the model has reasoned about anything.

A better pattern, and the one Chalk Compute is built for, is to hand the agent a tool that issues Chalk queries and let it decide what to pull. Because the agent runs inside a Chalk Function — co-located with the feature engine, inside your own cloud — each tool call is a real online query against the feature store. The agent reads the fraud score, decides it needs the account age, fetches that too, and rules. Nothing is pre-staged, and no data leaves your VPC.

This is what we mean by data-native agents: the feature store is the agent’s context window, queried on demand.

# Inside the agent loop, a tool call becomes a live Chalk query:
ctx = chalk_client.query(
    input={"user.id": user_id},
    output=requested_features,   # ← chosen by the model at inference time
)

The pattern

Three pieces turn an LLM into a data-native agent:

A feature-fetch tool exposed to the model, whose parameters name the features it is allowed to request.
A handler that turns each tool call into a ChalkClient.query(...).
A @chalkcompute.function wrapping the whole loop so it runs next to the feature engine with credentials injected.

Here is a refund-abuse investigator built exactly this way. The agent is given a get_chalk_features tool, and each time it calls that tool we run an online query for the features it asked for and feed the results back:

import json
import chalkcompute


@chalkcompute.function(secrets=[...], image=..., tracing=True)
def investigate_refund(user_id: int, reason: str) -> str:
    from chalk.client import ChalkClient

    chalk_client = ChalkClient()
    messages = [...]  # system prompt + the refund claim

    while True:
        response = openai_client.chat.completions.create(
            model="gpt-5.5",
            messages=messages,
            tools=[{
                "type": "function",
                "function": {
                    "name": "get_chalk_features",
                    "parameters": {  # the model names whichever features it wants
                        "type": "object",
                        "properties": {
                            "user_id": {"type": "integer"},
                            "features": {
                                "type": "array",
                                "items": {"type": "string"},
                            },
                        },
                    },
                },
            }],
        )

        msg = response.choices[0].message
        if not msg.tool_calls:
            return msg.content or ""          # final verdict
        messages.append(msg.model_dump(exclude_none=True))

        for tc in msg.tool_calls:
            args = json.loads(tc.function.arguments)
            # Each tool call becomes a live online query against the feature store.
            with chalkcompute.span("tool.get_chalk_features"):
                result = chalk_client.query(
                    input={"user.id": args["user_id"]},
                    output=args["features"],   # ← the query the agent composed
                )
            rendered = "\n".join(f"{r.field}: {r.value}" for r in result.data)
            messages.append({"role": "tool", "tool_call_id": tc.id, "content": rendered})

The model never sees a Chalk SDK or a feature definition — it names the features it wants as plain strings and emits a JSON array. The query is composed by the agent at inference time: output=args["features"] is whatever subset the model decided it needed on this turn. That is a Chalk query the agent wrote.

The `output` argument accepts **feature name strings** like `"user.is_fraud"`, not just `User.is_fraud` attributes. That is exactly what makes this work — the model emits strings, and they map straight onto a query.

See Online Queries for the full client API.

Best Practices

Scope the agent's data surface with token permissions

Notice the tool lets the model ask for any feature name — there is no hardcoded allowlist in the schema. That is deliberate. Pinning the permitted features into a tool schema is brittle: the list drifts the moment someone adds a feature, lives somewhere other than the feature definitions, and is only as trustworthy as the code that assembles the prompt. At best it is a hint to the model, never a security boundary.

The boundary belongs in Chalk instead. The agent queries under a service token, and tokens carry tag-based feature permissions — an allowlist or blocklist of feature tags that Chalk enforces server-side at query time. Tag your features once, at definition:

from chalk.features import features, feature

@features(tags=["agent-readable"])
class User:
    id: int
    total_spend: float
    is_fraud: bool
    # SSN can feed other features but never reach the agent:
    ssn: str = feature(tags=["pii"])

Issue the agent a token that allowlists agent-readable and blocklists pii. Every query the agent composes is checked against that scope, and a request for an out-of-scope feature comes back Unauthorized — even if the model hallucinated the name. Chalk also keeps separate permissions for computed-only versus output features, so a feature can participate in deriving others without ever being returned to the agent.

Don't keep sensitive features out of an agent's reach with a list in a prompt — the model can ignore it and the list rots. Give the agent the widest data surface its job needs and let tag permissions on its token decide what is off-limits. That boundary is enforced by the engine, not by the model's good behavior.

Fetch incrementally, in small batches

Prompt the agent to pull a few features, reason, then pull more — rather than requesting everything at once. Cheaper queries, smaller prompts, and a tool-call trace that reads like an investigation: checked fraud score → looked up account age → ruled. It also lets the agent stop early when it has seen enough.

Run the loop inside a Chalk Function

Wrap the agent in @chalkcompute.function and inject Chalk and model credentials with Secret. The function gives you autoscaling, retries, and concurrency limits for free, and keeps the ChalkClient co-located with the engine. Call it from anywhere with RemoteFunction.from_name(...):

from chalkcompute import RemoteFunction

investigate = RemoteFunction.from_name("investigate_refund")
verdict = investigate.remote(user_id=1, reason="item not received")

Trace every query

Wrap each tool call in chalkcompute.span(...) and set tracing=True on the function. You get a span per query showing which features the agent fetched and when — the audit trail for why an agent decided what it did. Indispensable for debugging agent behavior and reviewing high-stakes decisions.

Control freshness with staleness

If an agent can tolerate slightly stale values, pass staleness to accept cached features and skip recomputation — faster and cheaper. Reserve fresh computation for the signals where it matters.

result = chalk_client.query(
    input={"user.id": user_id},
    output=["user.total_spend"],
    staleness={"user.total_spend": "1h"},  # cached value up to 1h old is fine
)

Gate side-effecting tools

Read-only feature fetches are safe to let the agent fire at will. Tools that act — banning a user, issuing a refund, writing to a table — deserve more care: require human approval, restrict them to specific verdicts, or log them prominently. Keep the agent’s ability to read data wide and its ability to change things narrow.

Replay decisions with point-in-time correctness

To backtest an agent or reproduce a past decision, run the same feature reads as an offline query with input_times. Chalk computes each feature as of that timestamp, so the agent sees the data it would have seen then — no leakage from the future. The agent logic is unchanged; only the query mode differs.

How it fits together

┌─────────────┐   tool call: get_chalk_features(["user.is_fraud"])
│     LLM     │ ───────────────────────────────────────────────┐
│  (the agent)│                                                 ▼
└─────────────┘                                    ┌──────────────────────────┐
       ▲                                           │  @chalkcompute.function  │
       │  features as messages                     │  ChalkClient().query(...)│
       └───────────────────────────────────────── │   → Chalk feature engine  │
                                                   └──────────────────────────┘
                                                      (same cluster, your cloud)

The model decides what to fetch; the function executes it as a real Chalk query against the engine next door. The agent’s reasoning and your feature store operate as one system — which is the whole point of running agents on Chalk Compute.

​Overview

​The pattern

​Best Practices

​Scope the agent's data surface with token permissions

​Fetch incrementally, in small batches

​Run the loop inside a Chalk Function

​Trace every query

​Control freshness with staleness

​Gate side-effecting tools

​Replay decisions with point-in-time correctness

​How it fits together

On this page