Scheduled Query

When features are coming from a single SQL file, or a single resolver, you can use resolver crons to keep your online and offline stores up to date.

However, when features are chained together, or when you need to run a feature pipeline on a schedule, you can use scheduled queries.

Scheduled queries let you run an offline query on a schedule, and persist the results in the online and/or offline feature stores.

Create a Scheduled Query

To create a scheduled query, make a ScheduledQuery object somewhere in your code. Crontabs can be used to specify the schedule in the UTC timezone.

from chalk import ScheduledQuery

ScheduledQuery(
    name="enrich-transactions",
    schedule="0 0 * * *",
    output=[Transaction.clean_name, Transaction.category],
    store_online=True,
    store_offline=True,
)

At the time of chalk apply, the scheduled query will be created.

In the web, you can see the list of scheduled queries in Scheduled Runs tab. Each run will show the status of the last execution, and you can click into each run to see the logs.

Scheduled queries

Incrementalization

By default, scheduled queries use incrementalization to only ingest data which has been updated since the last run. You can also set a resolver to use as the source of the incrementalization. For example, if you were enriching financial transaction data, you might use the transactions.chalk.sql resolver as the source of incrementalization.

from chalk import ScheduledQuery

ScheduledQuery(
    name="enrich-transactions",
    schedule="0 0 * * *",
    output=[Transaction.clean_name, Transaction.category],
    store_online=True,
    store_offline=True,
    incremental_resolvers="transactions",
)

Monitoring

You can monitor the status of scheduled queries in the Scheduled Runs tab. From there, you can set up alerts to notify you when a scheduled query fails. To set up an alert, click on the Add Alert button in the top right corner of the page.

Cron Alerts

All alerts integrate with your other Chalk alerting integrations, including email, PagerDuty, Slack, and incident.io.

Additionally, you can export metrics about scheduled query executions to your observability tools. The cron_run_request metric (see metrics export documentation) tracks the number of times a scheduled query was executed, with tags for success/failure status. This allows you to monitor scheduled query reliability and alert when executions fall below expected thresholds in your preferred monitoring system.

Resources

By default, scheduled query runs are picked up by the Job Queue Server. To see your Job Queue Server resource configurations, check the Job Queue Server service under Resources in the dashboard. To customize the resources allocated to runs for a specific scheduled query, you can set the resource configuration under the Scheduled Query Configuration tab in the dashboard.

​Create a Scheduled Query

​Incrementalization

​Monitoring

​Resources

On this page

Create a Scheduled Query

Incrementalization

Monitoring

Resources