Overview

Two Velox features benefit from fast local NVMe SSDs (LSSDs) attached to your async offline-query workers:

  • Spilling writes per-query intermediate state to disk when a query exceeds its memory limit, letting large offline queries complete instead of out-of-memory crashing.
  • Table-scan SSD cache keeps a process-wide on-disk cache of reusable scan ranges from external table sources. Survives query completion and engine restarts, so repeated reads of the same partitions skip the round trip to object storage.

Both features can share a single LSSD-backed mount on the node. This page walks through the end-to-end setup.

**Scope:** this guide covers **AWS EKS clusters using Karpenter**. The infrastructure steps (EC2NodeClass, NodePool, instance families) are AWS-specific. The Chalk-side configuration (resource group, Job Queue Consumer, environment variables, client routing) applies regardless of cloud, but the Karpenter-specific UI fields and shell commands on this page will not apply verbatim to GCP GKE or Azure AKS deployments. For GKE local-SSD guidance, see the short note in [Kubernetes Resources Overview](/docs/kube-resources-drilldown#local-ssds-for-temporary-storage) or contact Chalk support.

This setup applies to **async offline queries** (`run_asynchronously=True`), which run on the [job queue](/docs/job-queue). Synchronous offline queries and online queries don't go through the job queue and aren't affected by the configuration on this page.


When this is useful

You’ll see the biggest impact from LSSD-backed workers when:

  • Async offline queries OOM or take many minutes longer than expected because they’re spilling to slow remote EBS.
  • Your offline store is backed by Iceberg and queries repeatedly read the same partitions or backfill date ranges — the scan cache turns repeat reads into local disk hits.
  • You want spill-heavy and cache-heavy workloads isolated from latency-sensitive online queries on a dedicated nodepool.

If your offline store is BigQuery, Snowflake, Redshift, or Databricks, the scan cache won’t help — those backends execute SQL on the warehouse and results come back through warehouse drivers, not through Velox’s scan path. Spilling still helps if those queries spill in memory, but the scan-cache section below applies only to the Iceberg path.


Setup overview

  1. Verify the LSSD EC2NodeClass exists in your cluster.
  2. Create a dedicated NodePool from the Chalk dashboard.
  3. Add a resource group with a Job Queue Consumer that targets the new NodePool.
  4. Configure resource requests and environment variables.
  5. Route async offline queries to the new resource group from your client code.
  6. Verify the setup after the first job.

Step 1: Verify the LSSD EC2NodeClass exists

Karpenter’s EC2NodeClass is an AWS-only resource — these steps don’t apply to GKE or AKS clusters. Chalk’s standard AWS Terraform provisions an EC2NodeClass named al2023-offline-lssd with spec.instanceStorePolicy: RAID0 automatically. Check whether yours is present:

kubectl get ec2nodeclass al2023-offline-lssd
  • If the resource exists, continue to Step 2.
  • If it returns NotFound, your cluster is either on an older infrastructure setup or you’re managing the EKS cluster yourself outside Chalk’s Terraform module. Contact Chalk support to have the EC2NodeClass provisioned — it requires cluster-specific IAM and networking values that vary across deployments, so it’s not safe to apply a generic manifest. Once support confirms it’s been created, re-run the kubectl get above and continue to Step 2.

instanceStorePolicy: RAID0 is the critical field on the EC2NodeClass — it makes Karpenter mount the instance’s local NVMe array as the node’s ephemeral storage, so the container overlay and any writes to non-volume paths land on local SSD.


Step 2: Create the NodePool

In the Chalk dashboard, go to Infrastructure → Nodepools and click + Add New Nodepool. Use these settings:

FieldValue
Nodepool Nameoffline-lssd (or similar)
EC2NodeClassal2023-offline-lssd
Kubernetes Clusteryour cluster
CPU Limit512 (cap total CPU the pool can provision)
Capacity typeon-demand
Instance categoriesm, c, r
Instance generations> 5
Instance sizesnot in [nano, micro, small, medium, large]
Architectureamd64
Zonesyour cluster’s availability zones
Isolate this nodepool✓ checked
Restrict to Chalk workloads only✓ checked
Nodepool Workload TypeDefault (leave alone)

Because the al2023-offline-lssd EC2NodeClass sets instanceStorePolicy: RAID0, Karpenter will only provision instance types that have local NVMe storage — no extra constraint is required to filter out non-LSSD families. If no LSSD instance is available in the requested categories or zones, pods will stay Pending rather than fall back to EBS.

Do **not** set Nodepool Workload Type to `Offline`. The dropdown option adds a `chalk.ai/workload-type=offline:NoSchedule` taint that no Chalk pod currently tolerates, which would make the pool repel every workload. Leave it as `Default`.

The two isolation checkboxes generate the taints that exclude unrelated workloads:

  • chalk.ai/nodepool=offline-lssd:NoSchedule (from “Isolate this nodepool”)
  • chalk.ai/managed-by=chalk:NoSchedule (from “Restrict to Chalk workloads only”)

Chalk auto-adds matching tolerations to pods that target this pool via the Resource Configuration form in Step 3.


Step 3: Add a resource group with a Job Queue Consumer

Go to Infrastructure → Resource Configuration. At the bottom of the resource groups tree, click + Add Resource Group. Give it a name like offline-lssd.

Under the new resource group, add a Job Queue Consumer service. You do not need to add a separate Job Queue Manager — there is one environment-wide Manager that polls jobs across all resource groups and spawns the per-group Consumer Deployments on demand.

On the Job Queue Consumer page:

  • Nodepool: select offline-lssd.
  • Instance Type: leave as None so Karpenter picks from the pool’s allowed instance types.

Step 4: Configure resources and environment variables

Resource requests

Set requests on the Requests panel. Two starting profiles, pick based on the size of your typical async offline query:

Standard (lands on a 2xlarge LSSD instance)

SettingValue
CPU7
Memory50Gi
Ephemeral Storage350Gi

Forces Karpenter to pick a 2xlarge LSSD instance (e.g. r6id.2xlarge — 8 vCPU, 64 GiB RAM, ~474 GB NVMe).

Heavier (for very large async offline queries)

SettingValue
CPU15
Memory100Gi
Ephemeral Storage600Gi

Forces a 4xlarge LSSD instance (e.g. r6id.4xlarge — 16 vCPU, 128 GiB RAM, ~950 GB NVMe).

Leave the Limits panel blank so spill writes can use whatever the LSSD provides without an artificial cap. Set Min Instances to 0 to scale to zero when idle, and Max Instances to 2 or 3 to cap concurrent LSSD nodes.

The scan cache is per-pod — each Consumer pod has its own cache on its own node’s LSSD, and libchalk uses an exclusive lock so caches are never shared across pods. Setting Min Instances to 1 keeps one warm cache alive, not a pool-wide warm cache. If the workload bursts above one pod, the additional pods start cold and warm their own caches independently. Only raise Min Instances above 0 if the workload re-reads the same data consistently enough that paying for one always-on LSSD instance (~$15/day for an r6id.2xlarge) is worth it.

Environment variables — required for all backends

Add these under Environment Variable Overrides:

VariableValuePurpose
CHALK_VELOX_SPILL_DIRECTORY/chalk-lssd-spillPer-query spill scratch space on local NVMe
CHALK_VELOX_QUERY_DEFAULT_MEMORY_LIMIT_PERCENT75Raise spill threshold — LSSD-dedicated nodes have headroom

/chalk-lssd-spill does not need to be mounted explicitly. With instanceStorePolicy: RAID0 in effect, the container’s writable overlay sits on the local NVMe array, so the engine creates the directory at this path and all writes go to LSSD automatically.

CHALK_VELOX_QUERY_DEFAULT_MEMORY_LIMIT_PERCENT=75 sets the in-memory working set Velox keeps before spilling to 75% of the container’s cgroup memory limit. With Memory=50Gi, that’s ~37 GiB of in-memory work before spill kicks in; with 100Gi, it’s ~75 GiB.

Environment variables — Iceberg only (optional)

If your offline store is backed by Iceberg (or you read Parquet/Delta tables directly through Velox via static resolvers), also add:

VariableValuePurpose
LIBCHALK_VELOX_TABLE_SCAN_SSD_CACHE_BYTES214748364800200 GiB persistent on-disk scan cache, shares the spill mount

The cache directory defaults to CHALK_VELOX_SPILL_DIRECTORY/table_scan_cache when not otherwise configured, so no extra path setup is needed.

Skip this variable if your offline store is **BigQuery, Snowflake, Redshift, or Databricks**. Those backends execute SQL on the warehouse and never go through Velox's table-scan operators, so the cache would be initialized but never see any reads — wasting LSSD capacity that could otherwise hold spill files.

Sizing the scan cache

214748364800 (200 GiB) is a stock starting value, not a universal default. The right size is roughly the working set of distinct external-table partitions your async offline queries repeatedly touch:

  • Small / focused workloads (e.g. daily backfills over the same date range): 10-50 GiB is usually enough.
  • Broad ad-hoc analytics that read many partitions: 100-300 GiB or more.

The cache size also has to fit on the LSSD alongside spill scratch. On a 2xlarge LSSD instance (~474 GB usable), 200 GiB for the cache leaves ~270 GB for spill files and the container overlay — comfortable. On smaller shapes, scale down. If startup logs warn that the cache directory’s available space is below the configured cache size, either shrink this value or increase the ephemeral-storage request so Karpenter picks a larger LSSD instance.


Step 5: Route queries from your client

The default resource group for offline queries is "default". To send a specific async offline query to the new LSSD-backed resource group, pass ResourceRequests(resource_group=...):

from chalk.client import ChalkClient, ResourceRequests

client = ChalkClient()
client.offline_query(
    input={'user.id': range(1_000_000)},
    output=['user.name'],
    run_asynchronously=True,
    resources=ResourceRequests(
        resource_group="offline-lssd",
    ),
)

Only queries that explicitly opt in via resource_group= will land on the new pool. Existing queries continue to use the default resource group and its existing nodepool, so you can roll out LSSD gradually for the queries that benefit most.

For scheduled queries, the equivalent kwarg lives directly on ScheduledQuery:

from chalk import ScheduledQuery

ScheduledQuery(
    name="weekly-aggregations",
    schedule="0 0 * * 0",
    output=[User.historical_aggregates],
    resource_group="offline-lssd",
)

Step 6: Verify the setup

After running an async offline query against the new resource group, confirm spilling actually happened by checking the query’s performance summary in the Chalk dashboard for spill_enabled=true and a nonzero spilled_bytes value. If neither field appears, the query didn’t exceed its memory limit and didn’t need to spill — small queries that fit in memory won’t trigger it.


See also

  • Offline queries — disk spilling and planner options for offline queries.
  • Job queue — how async offline queries flow through the job queue and Resource Groups.
  • Kubernetes resources — overview of Karpenter NodePools and EC2NodeClass.