With Chalk NamedQuery objects, you can define and version your common query patterns in code.

This provides several advantages:

  • queries are preplanned on engine boot, reducing first-query latency,
  • query outputs and parameters don’t need to be hardcoded, reducing boilerplate code and ensuring consistency between your queries,
  • queries are grouped together on the web, making them easier to track, monitor, and debug.

Relationship to models

Named queries typically map to specific models that you’re running in production. While feature classes model your domain objects (users, transactions, accounts) and may contain hundreds of features for reuse across different models, a named query selects only the specific subset of features needed for a particular model.

For example, you might have a User feature class with 100+ features capturing everything about a user: their profile, behavior metrics, transaction history aggregations, and risk signals. However, your fraud detection model might only need 15 specific features, while your recommendation model needs a different set of 30 features. Named queries let you define these model-specific feature sets, ensuring each model gets exactly what it needs without unnecessary computation.

This separation allows different models to access the same domain objects through feature classes while only requesting the features they need, improving performance and making it easier to track which features each model depends on.


Defining Named Queries

To define a named query, add a NamedQuery object to your Chalk deployment:

from chalk import NamedQuery
from src.models import User

NamedQuery(
    name="fraud",
    input=[User.id],
    output=[
        User.email_age_days,
        User.denylisted,
        User.credit_report.flags,
    ],
    tags=["team:fraud"],
    owner="jodie@chalk.ai",
    description="Primary fraud model for signup"
)

Running chalk apply makes the named query available in your deployment.


Using Named Queries

Named queries can then be leveraged through any of our clients by specifying the query_name parameter.

Using the Chalk CLI tool, this looks something like:

chalk query --in user.id=1 --query-name fraud

Because a named query has been specified, you don’t need to explicitly pass in the tags and outputs for your query. The above command is equivalent to running the more complicated:

chalk query \
  --in user.id=1 \
  --out user.email_age_days \
  --out user.denylisted \
  --out user.credit_report.flags \
  --tag team:fraud

This feature is also accessible in all of our API clients through the query_name parameter. For instance, in Python, you can run:

from chalk.client import ChalkClient

ChalkClient().query(
    input={"user.id": 1},
    query_name="fraud",
)

You can also run a named query offline, provided that all outputs have offline resolvers.

from chalk.client import ChalkClient

ChalkClient().offline_query(
    input={"user.id": 1},
    query_name="fraud",
    recompute_features=True,
)
df = dataset.get_data_as_pandas()

To see all the named queries you’ve defined in your current active deployment, you can run:

$ chalk named-query list
<example output>

Versioning Named Queries

If you want to create multiple versions of a similar query, you can use the version parameter of the NamedQuery object and the query_name_version parameter of our various clients.

Note, when executing a named query both the query name and the query version must match. This means that if you’ve defined two named queries in your codebase:

from chalk import NamedQuery
from src.models import User

NamedQuery(
    name="fraud",
    input=[User.id],
    output=[User.denylisted],
)

NamedQuery(
    name="fraud",
    version="1.1.0",
    input=[User.id],
    output=[
        User.email_age_days,
        User.denylisted,
        User.credit_report.flags,
    ],
)

And you run the following query:

chalk query --in user.id=1 --query-name fraud

We will return User.denylisted since the first named query has no version and no version was passed through query-name-version. To access a version named query, the version must be explicitly passed. For example:

chalk query --in user.id=1 --query-name fraud --query-name-version 1.1.0

Caching Ad-hoc Query Plans

Defining NamedQuery objects is the recommended way to ensure that your queries will be pre-planned on start-up, so that their planning time will not impact your query latency. By default, the environment variable CHALK_PRE_PLAN_NAMED_QUERIES=1 should be set to enable this. However, sometimes defining NamedQuery objects is not ergonomic or possible. For example, if you are a platform team serving multiple teams, you may not want to define a NamedQuery object for every query that your users run.

In this case, you can cache ad-hoc query plans by setting the following environment variables:

CHALK_STORE_ADHOC_QUERIES=true
CHALK_PLAN_ADHOC_QUERIES=3

The first environment variable will enable writing down query requests to the database. Setting the second environment variable to 3 will make the engine pod plan up to 3 of the most recently saved ad-hoc queries. These ad-hoc queries are re-planned at boot, so code or platform changes will be reflected in the query plan. With ad-hoc query caching enabled, you can cache the sketches of your most frequent queries without defining the queries in code.


Durable Plan Cache

The downside of caching and pre-planning ad-hoc query plans is that the pre-planning a large number of query plans during boot can take a lot of time. To help alleviate this, the Durable Plan Cache can be used. Query plans (as opposed to query requests) can be serialized and written down to the Durable Plan Cache. These plans will persist across pods and can be pre-loaded (as opposed to planned) into the in-memory plan cache on next pod startup. Since pre-loading is faster than planning, pod startups will take less time.

You can configure the Durable Plan Cache with the following environment variables:

CHALK_PERSIST_DURABLE_PLAN_CACHE=true
CHALK_PREPOPULATE_DURABLE_PLAN_CACHE=10
CHALK_PREPOPULATE_DURABLE_PLAN_CACHE_DURATION=259200

CHALK_PERSIST_DURABLE_PLAN_CACHE enables writing down query plans to the Durable Plan Cache. Then, you can choose how to load query plans on new pods using either CHALK_PREPOPULATE_DURABLE_PLAN_CACHE=k, which will load the top k most recent query plans written, or CHALK_PREPOPULATE_DURABLE_PLAN_CACHE_DURATION={DURATION_IN_SECONDS} which will load all query plans written within the specified duration since now.

Query plans written by a resource group in a deployment will only be valid for that resource group in the same deployment.