Understanding Chalk's job queue and resource groups
The job queue in Chalk, together with resource groups, functions similarly to warehouses in analytical data platforms - they provide dedicated, configurable compute resources for processing workloads.
A job queue server is a persistent worker process that consumes jobs from a queue and executes them one at a time. By configuring multiple resource groups with different job queue servers, you can create isolated compute environments optimized for different workload types.
The job queue handles two primary types of workloads:
run_asynchronously=True is set# This runs on the job queue
client.offline_query(
input={'user.id': range(1_000_000)},
output=['user.name'],
run_asynchronously=True, # Runs as a task on job queue
)
# This runs on the query server (NOT the job queue)
client.offline_query(
input={'user.id': [1, 2, 3]},
output=['user.name'],
# run_asynchronously=False by default - runs as synchronous RPC
)Jobs are processed in first-in, first-out (FIFO) order. Each job queue server processes one job at a time sequentially.
Each job queue server has a single, pre-configured resource allocation (CPU and memory).
If a job requests resources larger than the job queue server can handle, Chalk automatically skips the queue and runs the job as a standalone Kubernetes pod with the requested resources.
Resource groups allow you to create multiple job queue servers with different resource configurations. This is useful for:
In the Chalk dashboard under Settings > Resources, you can configure the “Job Queue Server” for each resource group:
All Chalk environments start with a Default resource group.
from chalk import ScheduledQuery
ScheduledQuery(
name="large-batch-job",
schedule="0 0 * * *",
output=[User.features],
resource_group="large-jobs", # Runs on the "large-jobs" resource group
)from chalk.client import ChalkClient, ResourceRequests
client.offline_query(
input={'user.id': range(1_000_000)},
output=['user.name'],
run_asynchronously=True,
resources=ResourceRequests(
resource_group="large-jobs" # Runs on the "large-jobs" resource group
),
)| Aspect | Job Queue Server | Query Server |
|---|---|---|
| Processes | Scheduled queries, async offline queries | Synchronous offline queries, online queries |
| Execution | One job at a time (FIFO) | Multiple concurrent requests |
| Resources | Fixed per resource group | Requested per query |
| Scaling | Horizontal (more instances) | Vertical (larger pods) |
| Workload Isolation | Jobs run sequentially without resource contention | Multiple concurrent queries may compete for resources on the same server |
| Timeout Behavior | Can run indefinitely beyond load balancer timeout | Will report an error if execution exceeds load balancer timeout |
Create separate resource groups for jobs with significantly different resource requirements
Right-size your default job queue to handle typical workloads
Use resource groups for isolation
Monitor queue depth and adjust max instances if jobs are waiting too long
Here’s a common setup with two resource groups:
# Default resource group: moderate sizing for typical scheduled queries
# Configured in dashboard: 8 CPU, 16 GB memory
ScheduledQuery(
name="daily-features",
schedule="0 1 * * *",
output=[User.daily_features],
# Uses default resource group
)
# Large jobs resource group: high-memory machines for big batch processing
# Configured in dashboard: 32 CPU, 450 GB memory
ScheduledQuery(
name="weekly-aggregations",
schedule="0 0 * * 0",
output=[User.historical_aggregates],
resource_group="large-jobs", # Uses dedicated high-memory queue
)