Chalk can collect CPU profiles from your engine nodes using Linux perf and upload the results to a cloud storage bucket. This helps the Chalk engineering team diagnose performance bottlenecks within Chalk itself.

Profiling is configured as an observability daemon in your background persistence configuration. Once enabled, a daemon runs on each Kubernetes node, samples CPU activity for Chalk-related processes, and periodically uploads the results to your bucket.


Prerequisites

  • A running Chalk deployment with background persistence configured
  • An S3 or GCS bucket for storing profiles
  • Write access from the background persistence service account to that bucket

Cloud storage permissions

GCP: The service account used by background persistence (configured via service_account_name in common_persistence_specs) needs write access to the target GCS bucket. You also need google_cloud_project set in common_persistence_specs.

AWS: Background persistence pods obtain AWS credentials through IAM Roles for Service Accounts (IRSA). The default IAM role is provisioned with broad S3 access within your account, so any same-account bucket will work without additional configuration. To write to a bucket in a different AWS account, add a bucket policy on the destination bucket granting access to the background persistence IAM role ARN.


Enabling perf profiling

There are two ways to enable perf profiling in your environment—through the Chalk dashboard if you manage your background persistence deployments there, or using the Chalk CLI.

Using the Chalk dashboard

  1. Navigate to Settings > Shared Resources > Background Persistence.
  2. Click Edit JSON.
  3. Add an observability_daemons array at the top level of the JSON, alongside the existing common_persistence_specs and writers:
"observability_daemons": [
  {
    "keep_running_when_suspended": true,
    "perf_collector": {
      "perf_polling_frequency_hz": 99,
      "call_graph": true,
      "max_dumps_retained": 3,
      "dump_duration_seconds": 120,
      "bucket_subdirectory": "perf-data",
      "export_to": "s3://your-bucket-name"
    }
  }
]
  1. Replace s3://your-bucket-name with your bucket URI. Use gs:// for GCS. The subdirectory name can be anything you like.
  2. Save and apply.

Using the Chalk CLI

With the Chalk CLI, you can export your current configuration, edit it, and re-apply to your background persistence deployment.

chalk infra describe persistence --json > persistence.json

Open persistence.json and add an observability_daemons array at the top level, alongside the existing common_persistence_specs and writers:

"observability_daemons": [
  {
    "keep_running_when_suspended": true,
    "perf_collector": {
      "perf_polling_frequency_hz": 99,
      "call_graph": true,
      "max_dumps_retained": 3,
      "dump_duration_seconds": 120,
      "bucket_subdirectory": "perf-data",
      "export_to": "s3://your-bucket-name"
    }
  }
]

Replace s3://your-bucket-name with your bucket URI. Use gs:// for GCS. Then apply:

chalk infra apply persistence -f persistence.json

The CLI will show a diff of the changes and prompt for confirmation.


Configuration reference

Perf collector fields

Attributes
perf_polling_frequency_hzinteger
Sampling frequency in Hz. 99 is a standard default that avoids aliasing with timer interrupts.
call_graphboolean
Capture full call stacks with each sample. Required for flame graph generation. Defaults to true.
max_dumps_retainedinteger
Maximum number of profile files to keep on disk per node. Older files are uploaded and deleted.
dump_duration_secondsinteger
How often, in seconds, perf record rotates its output file and the cleanup loop runs. Defaults to 60.
export_tostring
Bucket URI for uploads. Use s3://bucket-name for AWS, or gs://bucket-name for GCS.
bucket_subdirectorystring
Path prefix within the bucket. Files are organized as bucket_subdirectory/node-name/.

Observability daemon fields

These fields are set on the outer daemon object, alongside perf_collector.

Attributes
keep_running_when_suspendedboolean
Keep the daemon running when background persistence is suspended.
requestobject
Kubernetes resource requests (cpu, memory). Defaults to 25m CPU and 64Mi memory.
limitobject
Kubernetes resource limits (cpu, memory).
image_overridestring
Custom container image. Omit to use the default.

Where to find the profiles

Profiles appear in your bucket organized by node name:

BUCKET_SUBDIRECTORY/NODE_NAME/TIMESTAMP-perf-data.gz

For example, with bucket_subdirectory set to "perf-data" and a node named ip-10-0-1-42.ec2.internal:

s3://your-bucket-name/perf-data/ip-10-0-1-42.ec2.internal/20260220T153012-perf-data.gz

Each file contains gzip-compressed perf script output, filtered to Chalk-related processes.


Sending profiles to Chalk

Once profiles have been collected, download them from your bucket, compress them into an archive, and send it to your Chalk support contact:

For AWS:

aws s3 sync s3://your-bucket-name/perf-data/ ./perf-data/
tar czf perf-profiles.tar.gz perf-data/

For GCS:

gcloud storage rsync -r gs://your-bucket-name/perf-data/ ./perf-data/
tar czf perf-profiles.tar.gz perf-data/

Tuning overhead

Perf profiling does add CPU and I/O overhead to your nodes. You can adjust the following settings to find the right balance:

  • Lower the sampling frequency: Reduce perf_polling_frequency_hz (e.g., from 99 to 49).
  • Increase dump duration: A larger dump_duration_seconds produces fewer, larger files and reduces I/O frequency.

Disabling perf profiling

When profiling is no longer needed, remove the observability_daemons entry from your background persistence configuration and re-apply. The profiling daemonset will be removed automatically.

In the dashboard, navigate to Settings > Shared Resources > Background Persistence, click Edit JSON, delete the observability_daemons block, and save.

With the CLI:

chalk infra describe persistence --json > persistence.json
# Remove the observability_daemons array from persistence.json
chalk infra apply persistence -f persistence.json