Infrastructure
Collect CPU profiles from engine nodes and upload them to cloud storage for performance analysis.
Chalk can collect CPU profiles from your engine nodes using Linux perf and upload the results
to a cloud storage bucket. This helps the Chalk engineering team diagnose performance bottlenecks
within Chalk itself.
Profiling is configured as an observability daemon in your background persistence configuration. Once enabled, a daemon runs on each Kubernetes node, samples CPU activity for Chalk-related processes, and periodically uploads the results to your bucket.
GCP: The service account used by background persistence
(configured via service_account_name in common_persistence_specs) needs write
access to the target GCS bucket. You also need google_cloud_project set
in common_persistence_specs.
AWS: Background persistence pods obtain AWS credentials through IAM Roles for Service Accounts (IRSA). The default IAM role is provisioned with broad S3 access within your account, so any same-account bucket will work without additional configuration. To write to a bucket in a different AWS account, add a bucket policy on the destination bucket granting access to the background persistence IAM role ARN.
There are two ways to enable perf profiling in your environment—through the Chalk dashboard if you manage your background persistence deployments there, or using the Chalk CLI.
observability_daemons array at the top level of the JSON,
alongside the existing common_persistence_specs and writers:"observability_daemons": [
{
"keep_running_when_suspended": true,
"perf_collector": {
"perf_polling_frequency_hz": 99,
"call_graph": true,
"max_dumps_retained": 3,
"dump_duration_seconds": 120,
"bucket_subdirectory": "perf-data",
"export_to": "s3://your-bucket-name"
}
}
]s3://your-bucket-name with your bucket URI. Use gs:// for GCS.
The subdirectory name can be anything you like.With the Chalk CLI, you can export your current configuration, edit it, and re-apply to your background persistence deployment.
chalk infra describe persistence --json > persistence.jsonOpen persistence.json and add an observability_daemons array at the top level,
alongside the existing common_persistence_specs and writers:
"observability_daemons": [
{
"keep_running_when_suspended": true,
"perf_collector": {
"perf_polling_frequency_hz": 99,
"call_graph": true,
"max_dumps_retained": 3,
"dump_duration_seconds": 120,
"bucket_subdirectory": "perf-data",
"export_to": "s3://your-bucket-name"
}
}
]Replace s3://your-bucket-name with your bucket URI. Use gs:// for GCS.
Then apply:
chalk infra apply persistence -f persistence.jsonThe CLI will show a diff of the changes and prompt for confirmation.
perf_polling_frequency_hzinteger99 is a standard default that avoids aliasing with timer interrupts.call_graphbooleantrue.max_dumps_retainedintegerdump_duration_secondsintegerperf record rotates its output file and the cleanup loop runs. Defaults to 60.export_tostrings3://bucket-name for AWS, or gs://bucket-name for GCS.bucket_subdirectorystringbucket_subdirectory/node-name/.These fields are set on the outer daemon object, alongside perf_collector.
keep_running_when_suspendedbooleanrequestobjectcpu, memory). Defaults to 25m CPU and 64Mi memory.limitobjectcpu, memory).image_overridestringProfiles appear in your bucket organized by node name:
BUCKET_SUBDIRECTORY/NODE_NAME/TIMESTAMP-perf-data.gz
For example, with bucket_subdirectory set to "perf-data" and a node named
ip-10-0-1-42.ec2.internal:
s3://your-bucket-name/perf-data/ip-10-0-1-42.ec2.internal/20260220T153012-perf-data.gz
Each file contains gzip-compressed perf script output, filtered to Chalk-related processes.
Once profiles have been collected, download them from your bucket, compress them into an archive, and send it to your Chalk support contact:
For AWS:
aws s3 sync s3://your-bucket-name/perf-data/ ./perf-data/
tar czf perf-profiles.tar.gz perf-data/For GCS:
gcloud storage rsync -r gs://your-bucket-name/perf-data/ ./perf-data/
tar czf perf-profiles.tar.gz perf-data/Perf profiling does add CPU and I/O overhead to your nodes. You can adjust the following settings to find the right balance:
perf_polling_frequency_hz (e.g., from
99 to 49).dump_duration_seconds produces fewer, larger
files and reduces I/O frequency.When profiling is no longer needed, remove the observability_daemons entry from
your background persistence configuration and re-apply. The profiling daemonset will
be removed automatically.
In the dashboard, navigate to Settings > Shared Resources > Background Persistence,
click Edit JSON, delete the observability_daemons block, and save.
With the CLI:
chalk infra describe persistence --json > persistence.json
# Remove the observability_daemons array from persistence.json
chalk infra apply persistence -f persistence.json