Infrastructure
AWS EKS guide for using local NVMe storage to speed up offline-query spilling and Iceberg scan caching
Two Velox features benefit from fast local NVMe SSDs (LSSDs) attached to your async offline-query workers:
Both features can share a single LSSD-backed mount on the node. This page walks through the end-to-end setup.
**Scope:** this guide covers **AWS EKS clusters using Karpenter**. The infrastructure steps (EC2NodeClass, NodePool, instance families) are AWS-specific. The Chalk-side configuration (resource group, Job Queue Consumer, environment variables, client routing) applies regardless of cloud, but the Karpenter-specific UI fields and shell commands on this page will not apply verbatim to GCP GKE or Azure AKS deployments. For GKE local-SSD guidance, see the short note in [Kubernetes Resources Overview](/docs/kube-resources-drilldown#local-ssds-for-temporary-storage) or contact Chalk support.
This setup applies to **async offline queries** (`run_asynchronously=True`), which run on the [job queue](/docs/job-queue). Synchronous offline queries and online queries don't go through the job queue and aren't affected by the configuration on this page.
You’ll see the biggest impact from LSSD-backed workers when:
If your offline store is BigQuery, Snowflake, Redshift, or Databricks, the scan cache won’t help — those backends execute SQL on the warehouse and results come back through warehouse drivers, not through Velox’s scan path. Spilling still helps if those queries spill in memory, but the scan-cache section below applies only to the Iceberg path.
EC2NodeClass exists in your cluster.Karpenter’s EC2NodeClass is an AWS-only resource — these steps don’t apply to
GKE or AKS clusters. Chalk’s standard AWS Terraform provisions an EC2NodeClass
named al2023-offline-lssd with spec.instanceStorePolicy: RAID0
automatically. Check whether yours is present:
kubectl get ec2nodeclass al2023-offline-lssdNotFound, your cluster is either on an older infrastructure
setup or you’re managing the EKS cluster yourself outside Chalk’s Terraform
module. Contact Chalk support to have the EC2NodeClass provisioned — it
requires cluster-specific IAM and networking values that vary across
deployments, so it’s not safe to apply a generic manifest. Once support
confirms it’s been created, re-run the kubectl get above and continue to
Step 2.instanceStorePolicy: RAID0 is the critical field on the EC2NodeClass — it
makes Karpenter mount the instance’s local NVMe array as the node’s ephemeral
storage, so the container overlay and any writes to non-volume paths land on
local SSD.
In the Chalk dashboard, go to Infrastructure → Nodepools and click + Add New Nodepool. Use these settings:
| Field | Value |
|---|---|
| Nodepool Name | offline-lssd (or similar) |
| EC2NodeClass | al2023-offline-lssd |
| Kubernetes Cluster | your cluster |
| CPU Limit | 512 (cap total CPU the pool can provision) |
| Capacity type | on-demand |
| Instance categories | m, c, r |
| Instance generations | > 5 |
| Instance sizes | not in [nano, micro, small, medium, large] |
| Architecture | amd64 |
| Zones | your cluster’s availability zones |
| Isolate this nodepool | ✓ checked |
| Restrict to Chalk workloads only | ✓ checked |
| Nodepool Workload Type | Default (leave alone) |
Because the al2023-offline-lssd EC2NodeClass sets instanceStorePolicy: RAID0,
Karpenter will only provision instance types that have local NVMe storage —
no extra constraint is required to filter out non-LSSD families. If no LSSD
instance is available in the requested categories or zones, pods will stay
Pending rather than fall back to EBS.
Do **not** set Nodepool Workload Type to `Offline`. The dropdown option adds a `chalk.ai/workload-type=offline:NoSchedule` taint that no Chalk pod currently tolerates, which would make the pool repel every workload. Leave it as `Default`.
The two isolation checkboxes generate the taints that exclude unrelated workloads:
chalk.ai/nodepool=offline-lssd:NoSchedule (from “Isolate this nodepool”)chalk.ai/managed-by=chalk:NoSchedule (from “Restrict to Chalk workloads only”)Chalk auto-adds matching tolerations to pods that target this pool via the Resource Configuration form in Step 3.
Go to Infrastructure → Resource Configuration. At the bottom of the resource
groups tree, click + Add Resource Group. Give it a name like
offline-lssd.
Under the new resource group, add a Job Queue Consumer service. You do not need to add a separate Job Queue Manager — there is one environment-wide Manager that polls jobs across all resource groups and spawns the per-group Consumer Deployments on demand.
On the Job Queue Consumer page:
offline-lssd.None so Karpenter picks from the pool’s
allowed instance types.Set requests on the Requests panel. Two starting profiles, pick based on the size of your typical async offline query:
| Setting | Value |
|---|---|
| CPU | 7 |
| Memory | 50Gi |
| Ephemeral Storage | 350Gi |
Forces Karpenter to pick a 2xlarge LSSD instance (e.g. r6id.2xlarge —
8 vCPU, 64 GiB RAM, ~474 GB NVMe).
| Setting | Value |
|---|---|
| CPU | 15 |
| Memory | 100Gi |
| Ephemeral Storage | 600Gi |
Forces a 4xlarge LSSD instance (e.g. r6id.4xlarge — 16 vCPU, 128 GiB RAM,
~950 GB NVMe).
Leave the Limits panel blank so spill writes can use whatever the LSSD
provides without an artificial cap. Set Min Instances to 0 to scale
to zero when idle, and Max Instances to 2 or 3 to cap concurrent
LSSD nodes.
The scan cache is per-pod — each Consumer pod has its own cache on its own
node’s LSSD, and libchalk uses an exclusive lock so caches are never shared
across pods. Setting Min Instances to 1 keeps one warm cache alive,
not a pool-wide warm cache. If the workload bursts above one pod, the
additional pods start cold and warm their own caches independently. Only
raise Min Instances above 0 if the workload re-reads the same data
consistently enough that paying for one always-on LSSD instance (~$15/day
for an r6id.2xlarge) is worth it.
Add these under Environment Variable Overrides:
| Variable | Value | Purpose |
|---|---|---|
CHALK_VELOX_SPILL_DIRECTORY | /chalk-lssd-spill | Per-query spill scratch space on local NVMe |
CHALK_VELOX_QUERY_DEFAULT_MEMORY_LIMIT_PERCENT | 75 | Raise spill threshold — LSSD-dedicated nodes have headroom |
/chalk-lssd-spill does not need to be mounted explicitly. With
instanceStorePolicy: RAID0 in effect, the container’s writable overlay sits
on the local NVMe array, so the engine creates the directory at this path and
all writes go to LSSD automatically.
CHALK_VELOX_QUERY_DEFAULT_MEMORY_LIMIT_PERCENT=75 sets the in-memory working
set Velox keeps before spilling to 75% of the container’s cgroup memory limit.
With Memory=50Gi, that’s ~37 GiB of in-memory work before spill kicks in;
with 100Gi, it’s ~75 GiB.
If your offline store is backed by Iceberg (or you read Parquet/Delta tables directly through Velox via static resolvers), also add:
| Variable | Value | Purpose |
|---|---|---|
LIBCHALK_VELOX_TABLE_SCAN_SSD_CACHE_BYTES | 214748364800 | 200 GiB persistent on-disk scan cache, shares the spill mount |
The cache directory defaults to CHALK_VELOX_SPILL_DIRECTORY/table_scan_cache
when not otherwise configured, so no extra path setup is needed.
Skip this variable if your offline store is **BigQuery, Snowflake, Redshift, or Databricks**. Those backends execute SQL on the warehouse and never go through Velox's table-scan operators, so the cache would be initialized but never see any reads — wasting LSSD capacity that could otherwise hold spill files.
214748364800 (200 GiB) is a stock starting value, not a universal default.
The right size is roughly the working set of distinct external-table partitions
your async offline queries repeatedly touch:
10-50 GiB is usually enough.100-300 GiB or more.The cache size also has to fit on the LSSD alongside spill scratch. On a
2xlarge LSSD instance (~474 GB usable), 200 GiB for the cache leaves
~270 GB for spill files and the container overlay — comfortable. On smaller
shapes, scale down. If startup logs warn that the cache directory’s available
space is below the configured cache size, either shrink this value or
increase the ephemeral-storage request so Karpenter picks a larger LSSD
instance.
The default resource group for offline queries is "default". To send a
specific async offline query to the new LSSD-backed resource group, pass
ResourceRequests(resource_group=...):
from chalk.client import ChalkClient, ResourceRequests
client = ChalkClient()
client.offline_query(
input={'user.id': range(1_000_000)},
output=['user.name'],
run_asynchronously=True,
resources=ResourceRequests(
resource_group="offline-lssd",
),
)Only queries that explicitly opt in via resource_group= will land on the new
pool. Existing queries continue to use the default resource group and its
existing nodepool, so you can roll out LSSD gradually for the queries that
benefit most.
For scheduled queries, the equivalent kwarg lives
directly on ScheduledQuery:
from chalk import ScheduledQuery
ScheduledQuery(
name="weekly-aggregations",
schedule="0 0 * * 0",
output=[User.historical_aggregates],
resource_group="offline-lssd",
)After running an async offline query against the new resource group, confirm
spilling actually happened by checking the query’s performance summary in the
Chalk dashboard for spill_enabled=true and a nonzero spilled_bytes value.
If neither field appears, the query didn’t exceed its memory limit and didn’t
need to spill — small queries that fit in memory won’t trigger it.