Queries - Fetch feature values via queries.

To request or compute data with Chalk, you’ll use queries. In general, when you run a Chalk query, you are either requesting the most up-to-date values for your feature classes, requesting a set of historical data for your feature classes, or running a backfill or batch job.

The first use case is accomplished through online queries, which try to return values for a single feature class as quickly as possible, taking advantage of caching and distributed execution.

The latter use cases are accomplished through offline queries which use the offline store and can return multiple instances of a feature class for multiple primary keys or timepoints.

Running queries

At a high level, a query specifies input features and output features. Inputs differ slightly for online queries and offline queries, but in both cases the input must contain the primary key of your requested feature class.

Running online queries

Online queries can be run using query:

chalk query --in user.id=1 --out user.name

or one of our API Clients:

from chalk.client import ChalkClient

client = ChalkClient()
client.query(
  input={'user.id': 1},
  output=['user.name'],
  # branch='test', # run against a branch
)

Running offline queries

Offline queries can be run with one of our API Clients:

from chalk.client import ChalkClient
from datetime import datetime

client = ChalkClient()
client.offline_query(
  input={'user.id': [1,2,3,4]},
  output=['user.name']
  # branch='test',                      # run against a branch
  # recompute_features=True,            # recompute features
  # run_asynchronously=True,            # run in separate pod from active deployment
  # max_samples = 10,                   # max of 10 samples
  # lower_bound=datetime(2024, 10, 12), # sample computed after 10.12.2024
  # upper_bound=datetime(2024, 10, 20), # sample computed before 10.20.2024
)

Running queries using gRPC

If you have a gRPC query server active in your environment, you can also run queries using the gRPC client:

from chalk.client.client_grpc import ChalkGRPCClient

grpc_client = ChalkGRPCClient()
grpc_client.query(
  input={'user.id': 1},
  output=['user.name']
)

Scheduled and triggered resolver runs

Specific resolvers can also be scheduled or triggered (for instance, as part of pipelines like Airflow). Specific queries can also be scheduled with ScheduledQuery. Triggers and schedules are useful for pulling data from “slow” data sources into your offline and online store.

Query side effects

Chalk queries can also write data. This is an essential part of Chalk: every time you compute a feature through an online query, the output is written down in the offline store. This makes it easy to:

create datasets from your previously computed features,
monitor and track your computed features over time.

Though not the default, offline queries can write data to both the offline store and the online store using etl_offline_to_online. This can be useful when backfilling data from slow data sources or when performing expensive feature computation that would otherwise significantly impact the latency of your online queries.

Online and offline query differences

Online query	Offline query
Runs only `@online` resolvers	Runs both `@online` and `@offline` resolvers
Returns one row of data about one entity	Returns a DataFrame of many rows of historical data corresponding to multiple entities point-in-time
Designed to return data in milliseconds	Blocks until computation is complete, not designed for millisecond-level computation
Queries the online store and calls `@online` resolvers for quick retrieval	Queries the offline store which stores all data from online queries, unless `recompute_features=True`, in which case `@offline` and `@online` resolvers are used to resolve the outputs
Writes output data to online store database and offline store database	Writes output to a parquet file containing results to cloud storage. Only writes to online store or offline store if specified.

Queries overview

Running queries

Running online queries

Running offline queries

Running queries using gRPC

Scheduled and triggered resolver runs

Query side effects

Online and offline query differences

On this page

​Running queries

​Running online queries

​Running offline queries

​Running queries using gRPC

​Scheduled and triggered resolver runs

​Query side effects

​Online and offline query differences

On this page

Running queries

Running online queries

Running offline queries

Running queries using gRPC

Scheduled and triggered resolver runs

Query side effects

Online and offline query differences