Queries
Run feature pipelines on a schedule
When features are coming from a single SQL file, or a single resolver, you can use resolver crons to keep your online and offline stores up to date.
However, when features are chained together, or when you need to run a feature pipeline on a schedule, you can use scheduled queries.
Scheduled queries let you run an offline query on a schedule, and persist the results in the online and/or offline feature stores.
To create a scheduled query, make a ScheduledQuery object somewhere in your code. Crontabs can be used to specify the schedule in the UTC timezone.
from chalk import ScheduledQuery
ScheduledQuery(
name="enrich-transactions",
schedule="0 0 * * *",
output=[Transaction.clean_name, Transaction.category],
store_online=True,
store_offline=True,
)
At the time of chalk apply
, the scheduled query will be
created.
In the web, you can see the list of scheduled queries in Scheduled Runs
tab.
Each run will show the status of the last execution, and you can click into
each run to see the logs.
By default, scheduled queries use incrementalization to only ingest data which
has been updated since the last run. You can also set a resolver to use as the
source of the incrementalization. For example, if you were enriching financial
transaction data, you might use the transactions.chalk.sql
resolver as the
source of incrementalization.
from chalk import ScheduledQuery
ScheduledQuery(
name="enrich-transactions",
schedule="0 0 * * *",
output=[Transaction.clean_name, Transaction.category],
online=True,
offline=True,
incremental_resolvers="transactions",
)
You can monitor the status of scheduled queries in the Scheduled Runs
tab.
From there, you can set up alerts to notify you when a scheduled query fails.
To set up an alert, click on the Add Alert
button in the top right corner of the
page.
All alerts integrate with your other Chalk alerting integrations, including email, PagerDuty, Slack, and incident.io.
Additionally, you can export metrics about scheduled query executions to your observability tools.
The cron_run_request
metric (see metrics export documentation) tracks the number of times a scheduled
query was executed, with tags for success/failure status. This allows you to monitor scheduled query reliability
and alert when executions fall below expected thresholds in your preferred monitoring system.