Chalk supports two profiling tools to help diagnose performance bottlenecks in your deployment:

  • Perf profiling — samples CPU activity using Linux perf and uploads gzip-compressed perf script output to your bucket.
  • Perfetto tracing — captures system-wide traces using Perfetto and uploads them in Perfetto’s native proto format.

Both are configured as observability daemons in your background persistence configuration. Once enabled, a daemon runs on each Kubernetes node and periodically uploads results to your bucket.


Choosing which tool to use

Perfetto offers a superset of the features that perf offers, but at the cost of some performance. With Perfetto, you can collect not only CPU profile information about the nodes that Chalk runs on, but also correlate those with traces emitted by the Chalk Engine.

It is recommended to start with Perf profiling only, and move over to using Perfetto if Perf does not provide enough information.

For Perfetto to work, in addition to configuring the profiling in the background persistence (as is outlined in this guide), you also need to enable trace outputs from your engine by setting the environment variable LIBCHALK_ENABLE_PERFETTO_TRACING=true.


Prerequisites

  • A running Chalk deployment with background persistence configured. To install, please follow this guide
  • An S3 or GCS bucket for storing profiles and traces
  • Write access from the background persistence service account to that bucket

Cloud storage permissions

GCP: The service account used by background persistence (configured via service_account_name in common_persistence_specs) needs write access to the target GCS bucket. You also need google_cloud_project set in common_persistence_specs.

AWS: Background persistence pods obtain AWS credentials through IAM Roles for Service Accounts (IRSA). The default IAM role is provisioned with broad S3 access within your account, so any same-account bucket will work without additional configuration. To write to a bucket in a different AWS account, add a bucket policy on the destination bucket granting access to the background persistence IAM role ARN.


Perf profiling

The perf collector samples CPU activity for Chalk-related processes on each node and periodically uploads gzip-compressed perf script output to your bucket.

Enabling perf profiling

There are two ways to enable perf profiling—through the Chalk dashboard if you manage your background persistence deployments there, or using the Chalk CLI.

Using the Chalk dashboard

  1. Navigate to Settings > Shared Resources > Background Persistence.
  2. Click Edit JSON.
  3. Add an observability_daemons array at the top level of the JSON, alongside the existing common_persistence_specs and writers:
"observability_daemons": [
  {
    "keep_running_when_suspended": true,
    "perf_collector": {
      "perf_polling_frequency_hz": 99,
      "call_graph": true,
      "max_dumps_retained": 3,
      "dump_duration_seconds": 120,
      "bucket_subdirectory": "perf-data",
      "export_to": "s3://your-bucket-name"
    }
  }
]
  1. Replace s3://your-bucket-name with your bucket URI. Use gs:// for GCS. The subdirectory name can be anything you like.
  2. Save and apply.

Using the Chalk CLI

With the Chalk CLI, you can export your current configuration, edit it, and re-apply to your background persistence deployment.

chalk infra describe persistence --json > persistence.json

Open persistence.json and add an observability_daemons array at the top level, alongside the existing common_persistence_specs and writers:

"observability_daemons": [
  {
    "keep_running_when_suspended": true,
    "perf_collector": {
      "perf_polling_frequency_hz": 99,
      "call_graph": true,
      "max_dumps_retained": 3,
      "dump_duration_seconds": 120,
      "bucket_subdirectory": "perf-data",
      "export_to": "s3://your-bucket-name"
    }
  }
]

Replace s3://your-bucket-name with your bucket URI. Use gs:// for GCS. Then apply:

chalk infra apply persistence -f persistence.json

The CLI will show a diff of the changes and prompt for confirmation.

Configuration reference

Attributes
perf_polling_frequency_hzinteger
Sampling frequency in Hz. 99 is a standard default that avoids aliasing with timer interrupts.
call_graphboolean
Capture full call stacks with each sample. Required for flame graph generation. Defaults to true.
max_dumps_retainedinteger
Maximum number of profile files to keep on disk per node. Older files are uploaded and deleted.
dump_duration_secondsinteger
How often, in seconds, perf record rotates its output file and the cleanup loop runs. Defaults to 60.
export_tostring
Bucket URI for uploads. Use s3://bucket-name for AWS, or gs://bucket-name for GCS.
bucket_subdirectorystring
Path prefix within the bucket. Files are organized as bucket_subdirectory/node-name/.

Where to find profiles

Profiles appear in your bucket organized by node name:

BUCKET_SUBDIRECTORY/NODE_NAME/TIMESTAMP-perf-data.gz

For example, with bucket_subdirectory set to "perf-data" and a node named ip-10-0-1-42.ec2.internal:

s3://your-bucket-name/perf-data/ip-10-0-1-42.ec2.internal/20260220T153012-perf-data.gz

Each file contains gzip-compressed perf script output, filtered to Chalk-related processes.

Tuning overhead

Perf profiling adds CPU and I/O overhead to your nodes. You can adjust the following settings to find the right balance:

  • Lower the sampling frequency: Reduce perf_polling_frequency_hz (e.g., from 99 to 49).
  • Increase dump duration: A larger dump_duration_seconds produces fewer, larger files and reduces I/O frequency.

Perfetto tracing

The Perfetto daemon captures system-wide traces on each node. Traces can be triggered on a fixed time interval or on-demand via an HTTP endpoint that the Chalk CLI can call. Trace files are in Perfetto’s native proto format and can be opened directly in the Perfetto UI.

Enabling Perfetto tracing

There are two ways to enable Perfetto tracing—through the Chalk dashboard if you manage your background persistence deployments there, or using the Chalk CLI.

Using the Chalk dashboard

  1. Navigate to Settings > Shared Resources > Background Persistence.
  2. Click Edit JSON.
  3. Add an observability_daemons array at the top level of the JSON, alongside the existing common_persistence_specs and writers:
"observability_daemons": [
  {
    "keep_running_when_suspended": true,
    "perfetto_daemon": {
      "trigger": "PERFETTO_TRIGGER_TIME_INTERVAL",
      "interval": 60000,
      "max_retained_runs": 3,
      "bucket_subdirectory": "perfetto-traces",
      "export_to": "s3://your-bucket-name",
      "trigger_name": "chalk_traces",
      "config_text": "buffers: {\n  size_kb: 102400\n  fill_policy: RING_BUFFER\n}\n\ndata_sources: {\n  config {\n    name: \"linux.perf\"\n    perf_event_config {\n      all_cpus: true\n      sampling_frequency: 100\n    }\n  }\n}\n\ntrigger_config {\n  trigger_mode: CLONE_SNAPSHOT\n  triggers {\n    name: \"chalk_traces\"\n    stop_delay_ms: 1000\n  }\n}\n"
    }
  }
]
  1. Replace s3://your-bucket-name with your bucket URI. Use gs:// for GCS. Replace the config_text value with a valid Perfetto text proto config.
  2. Save and apply.

Using the Chalk CLI

With the Chalk CLI, you can export your current configuration, edit it, and re-apply to your background persistence deployment.

chalk infra describe persistence --json > persistence.json

Open persistence.json and add an observability_daemons array at the top level, alongside the existing common_persistence_specs and writers:

"observability_daemons": [
  {
    "keep_running_when_suspended": true,
    "perfetto_daemon": {
      "trigger": "PERFETTO_TRIGGER_TIME_INTERVAL",
      "interval": 60000,
      "max_retained_runs": 3,
      "bucket_subdirectory": "perfetto-traces",
      "export_to": "s3://your-bucket-name",
      "trigger_name": "chalk_traces",
      "config_text": "buffers: {\n  size_kb: 102400\n  fill_policy: RING_BUFFER\n}\n\ndata_sources: {\n  config {\n    name: \"linux.perf\"\n    perf_event_config {\n      all_cpus: true\n      sampling_frequency: 100\n    }\n  }\n}\n\ntrigger_config {\n  trigger_mode: CLONE_SNAPSHOT\n  triggers {\n    name: \"chalk_traces\"\n    stop_delay_ms: 1000\n  }\n}\n"
    }
  }
]

Replace s3://your-bucket-name with your bucket URI. Use gs:// for GCS. Then apply:

chalk infra apply persistence -f persistence.json

The CLI will show a diff of the changes and prompt for confirmation.

Generating Perfetto text proto config

The config_text field must contain a valid Perfetto text proto config.

Regardless of which trigger mode you use, the config must include a trigger_config block with trigger_mode: CLONE_SNAPSHOT and a trigger whose name exactly matches the trigger_name field in your daemon config. This is how Perfetto knows when to snapshot the ring buffer and emit a trace.

Write your config in a .pbtxt file. For example, a config that samples CPU at 99 Hz and snapshots on the trigger "chalk_traces" would look like:

buffers: {
  size_kb: 102400
  fill_policy: RING_BUFFER
}

data_sources: {
  config {
    name: "linux.perf"
    perf_event_config {
      all_cpus: true
      sampling_frequency: 99
    }
  }
}

trigger_config {
  trigger_mode: CLONE_SNAPSHOT
  triggers {
    name: "chalk_traces"
    stop_delay_ms: 1000
  }
}

Because config_text is embedded as a JSON string, newlines and quotes must be escaped. Use jq to produce the correctly escaped value from your .pbtxt file:

jq -Rs '.' < perfetto.pbtxt

This prints the file contents as a quoted, escaped JSON string. Copy the output (including the surrounding quotes) and use it as the config_text value in your persistence config.

Trigger modes

The Perfetto daemon supports two ways to initiate a trace capture:

  • PERFETTO_TRIGGER_TIME_INTERVAL — Traces are collected automatically on a fixed interval (controlled by the interval field). Use this for continuous background profiling.
  • PERFETTO_TRIGGER_HTTP — An HTTP endpoint is exposed on port 3565. The cluster manager can call this endpoint to trigger a trace on demand. Use this when you want to capture a trace at a specific moment, such as during a known slow request. At most one HTTP-triggered Perfetto daemon may be configured per environment.

When PERFETTO_TRIGGER_HTTP is used, the cluster manager is automatically configured with the CHALK_PERFETTO_DAEMON_PORT and CHALK_PERFETTO_DAEMON_NAMESPACE environment variables. You can then trigger a snapshot with:

chalk profiling perfetto-snapshot

Regardless of the trigger mode you use for the daemon, the underlying Perfetto config needs to use trigger_mode: CLONE_SNAPSHOT for the system to work properly.

On-demand tracing (HTTP trigger)

To enable on-demand tracing via chalk profiling perfetto-snapshot, use PERFETTO_TRIGGER_HTTP as the trigger mode and set a trigger_name:

"observability_daemons": [
  {
    "keep_running_when_suspended": true,
    "perfetto_daemon": {
      "trigger": "PERFETTO_TRIGGER_HTTP",
      "trigger_name": "chalk_snapshot",
      "max_retained_runs": 5,
      "bucket_subdirectory": "perfetto-traces",
      "export_to": "gs://your-bucket-name",
      "config_text": "buffers: {\n  size_kb: 102400\n  fill_policy: RING_BUFFER\n}\n\ndata_sources: {\n  config {\n    name: \"linux.perf\"\n    perf_event_config {\n      all_cpus: true\n      sampling_frequency: 100\n    }\n  }\n}\n\ntrigger_config {\n  trigger_mode: CLONE_SNAPSHOT\n  triggers {\n    name: \"chalk_snapshot\"\n    stop_delay_ms: 1000\n  }\n}\n"
    }
  }
]

Once deployed, trigger a trace capture with:

chalk profiling perfetto-snapshot

Configuration reference

Attributes
config_textstring
Perfetto tracing configuration in text proto format (.pbtxt). This is required. See the Perfetto config documentation for available data sources and options.
triggerstring
How traces are initiated. Use PERFETTO_TRIGGER_TIME_INTERVAL for automatic periodic collection, or PERFETTO_TRIGGER_HTTP for on-demand collection via chalk profiling perfetto-snapshot.
intervalinteger
Interval between traces in milliseconds. This is also how frequently the system will be scanned for new trace files to upload.
trigger_namestring
Perfetto trigger name. Required. Must exactly match the trigger name in the trigger_config block of config_text.
max_retained_runsinteger
Maximum number of trace files to keep on disk per node. Older files are uploaded and deleted. Recommended to set this to 0 to start uploading immediately.
export_tostring
Bucket URI for uploads. Use s3://bucket-name for AWS, or gs://bucket-name for GCS.
bucket_subdirectorystring
Path prefix within the bucket. Files are organized as bucket_subdirectory/node-name/.

Where to find traces

Traces appear in your bucket organized by node name:

BUCKET_SUBDIRECTORY/NODE_NAME/TIMESTAMP-perfetto-trace.pb

For example, with bucket_subdirectory set to "perfetto-traces" and a node named ip-10-0-1-42.ec2.internal:

s3://{your-bucket-name}/perfetto-traces/ip-10-0-1-42.ec2.internal/20260220T153012-perfetto-trace.pb

Common configuration

The following fields apply to both the perf_collector and perfetto_daemon daemon objects.

Attributes
keep_running_when_suspendedboolean
Keep the daemon running when background persistence is suspended.
requestobject
Kubernetes resource requests (cpu, memory). Defaults to 25m CPU and 64Mi memory.
limitobject
Kubernetes resource limits (cpu, memory).
image_overridestring
Custom container image. Omit to use the default.

Sending data to Chalk

Once data has been collected, download it from your bucket, compress it into an archive, and send it to your Chalk support contact.

Perf profiles

For AWS:

aws s3 sync s3://your-bucket-name/perf-data/ ./perf-data/
tar czf perf-profiles.tar.gz perf-data/

For GCS:

gcloud storage rsync -r gs://your-bucket-name/perf-data/ ./perf-data/
tar czf perf-profiles.tar.gz perf-data/

Perfetto traces

For AWS:

aws s3 sync s3://your-bucket-name/perfetto-traces/ ./perfetto-traces/
tar czf perfetto-traces.tar.gz perfetto-traces/

For GCS:

gcloud storage rsync -r gs://your-bucket-name/perfetto-traces/ ./perfetto-traces/
tar czf perfetto-traces.tar.gz perfetto-traces/

Disabling profiling

When profiling is no longer needed, remove the observability_daemons entry from your background persistence configuration and re-apply. The profiling daemonset will be removed automatically.

In the dashboard, navigate to Settings > Shared Resources > Background Persistence, click Edit JSON, delete the observability_daemons block, and save.

With the CLI:

chalk infra describe persistence --json > persistence.json
# Remove the observability_daemons array from persistence.json
chalk infra apply persistence -f persistence.json