Chalk home page
Docs
SDK
CLI
  1. Queries
  2. Named Queries

With Chalk NamedQuery objects, you can define and version your common query patterns in code.

This provides several advantages:

  • queries are preplanned on engine boot, reducing first-query latency,
  • query outputs and parameters don’t need to be hardcoded, reducing boilerplate code and ensuring consistency between your queries,
  • queries are grouped together on the web, making them easier to track, monitor, and debug.

Relationship to models

Named queries typically map to specific models that you’re running in production. While feature classes model your domain objects (users, transactions, accounts) and may contain hundreds of features for reuse across different models, a named query selects only the specific subset of features needed for a particular model.

For example, you might have a User feature class with 100+ features capturing everything about a user: their profile, behavior metrics, transaction history aggregations, and risk signals. However, your fraud detection model might only need 15 specific features, while your recommendation model needs a different set of 30 features. Named queries let you define these model-specific feature sets, ensuring each model gets exactly what it needs without unnecessary computation.

This separation allows different models to access the same domain objects through feature classes while only requesting the features they need, improving performance and making it easier to track which features each model depends on.


Defining Named Queries

To define a named query, add a NamedQuery object to your Chalk deployment:

from chalk import NamedQuery
from src.models import User

NamedQuery(
    name="fraud",
    input=[User.id],
    output=[
        User.email_age_days,
        User.denylisted,
        User.credit_report.flags,
    ],
    tags=["team:fraud"],
    owner="jodie@chalk.ai",
    description="Primary fraud model for signup"
)

Running chalk apply makes the named query available in your deployment.


Using Named Queries

Named queries can then be leveraged through any of our clients by specifying the query_name parameter.

Using the Chalk CLI tool, this looks something like:

chalk query --in user.id=1 --query-name fraud

Because a named query has been specified, you don’t need to explicitly pass in the tags and outputs for your query. The above command is equivalent to running the more complicated:

chalk query \
  --in user.id=1 \
  --out user.email_age_days \
  --out user.denylisted \
  --out user.credit_report.flags \
  --tag team:fraud

This feature is also accessible in all of our API clients through the query_name parameter. For instance, in Python, you can run:

from chalk.client import ChalkClient

ChalkClient().query(
    input={"user.id": 1},
    query_name="fraud",
)

You can also run a named query offline, provided that all outputs have offline resolvers.

from chalk.client import ChalkClient

ChalkClient().offline_query(
    input={"user.id": 1},
    query_name="fraud",
    recompute_features=True,
)
df = dataset.get_data_as_pandas()

To see all the named queries you’ve defined in your current active deployment, you can run:

$ chalk named-query list
<example output>

Versioning Named Queries

If you want to create multiple versions of a similar query, you can use the version parameter of the NamedQuery object and the query_name_version parameter of our various clients.

Note, when executing a named query both the query name and the query version must match. This means that if you’ve defined two named queries in your codebase:

from chalk import NamedQuery
from src.models import User

NamedQuery(
    name="fraud",
    input=[User.id],
    output=[User.denylisted],
)

NamedQuery(
    name="fraud",
    version="1.1.0",
    input=[User.id],
    output=[
        User.email_age_days,
        User.denylisted,
        User.credit_report.flags,
    ],
)

And you run the following query:

chalk query --in user.id=1 --query-name fraud

We will return User.denylisted since the first named query has no version and no version was passed through query-name-version. To access a version named query, the version must be explicitly passed. For example:

chalk query --in user.id=1 --query-name fraud --query-name-version 1.1.0

Caching ad-hoc query plans

Sometimes defining NamedQuery objects is not ergonomic or possible. For example, if you are a platform team serving multiple teams, you may not want to define a NamedQuery object for every query that your users run.

In this case, you can use these environment variables:

CHALK_STORE_ADHOC_QUERIES=true
CHALK_PLAN_ADHOC_QUERIES=3

The first environment variable will cache the ad-hoc query requests in the database. The second environment variable will plan up to 3 of the most recent ad-hoc queries. These Ad-hoc queries are re-planned at boot so that code or platform changes can be reflected in the query plan.