Queries
Automated offline query sharding
Metaplanning is a feature that automates shard assignments for select offline queries.
This feature is automatically enabled by default for scheduled queries that have num_shards set.
Contact our support team to enable metaplanning for all scheduled queries in your environments, including ones without an explicit num_shards set.
Any async offline query can be metaplanned by setting use_metaplaner=True in the API invocation.
Offline queries selected for metaplanning go through a metaplanning workflow:
Example: A query with 100,000 rows and the default target of 10,000 rows per shard creates 10 parallel shard jobs.
The shard size can be controlled via the CHALK_AUTOSHARDER_TARGET_ROWS_PER_SHARD environment variable (default: 10,000 rows per shard).
For scheduled queries:
num_shards is set, metaplanning is automatically engaged.num_shards is not set, metaplanning can still be engaged as an environment setting.For async offline queries:
use_metaplanner=True, the query will be metaplanned.ScheduledQuery(
name="daily_user_scores",
outputs=[User.id, User.score],
schedule="0 0 * * *",
)With metaplanning enabled, this query will:
input is specified)client.offline_query(
input_sql='SELECT "user.id" FROM "chalk.resolvers.list_users"',
outputs=[User.id, User.score],
recompute_features=True,
run_asynchronously=True,
use_metaplanner=True,
)This query will:
input_sql.client.offline_query(
max_samples=1000,
outputs=[User.id, User.score],
recompute_features=True,
run_asynchronously=True,
use_metaplanner=True,
)This query will: