DynamoDB Online Store Deployment

Overview

Chalk supports DynamoDB as an online store. Online query results and cached feature values are written to DynamoDB by the background persistence writers and read directly by the query servers. This page covers how to size DynamoDB capacity, how to choose between single-region and multi-region deployments, and how to provision everything via Terraform.

Chalk’s DynamoDB online store uses a single table per environment with feature keys encoded to minimize both storage and capacity consumption: values are stored using native DynamoDB data types (not JSON-encoded strings), and feature names are compressed to short stable identifiers. This means that DynamoDB capacity sizing in practice consumes noticeably less WCU/RCU than a naive estimate based on the raw JSON size of a feature set would suggest.

DynamoDB vs. Valkey/Redis

Chalk supports both DynamoDB and Valkey (or Redis) as online stores. The right choice depends on your workload:

DynamoDB is the better fit when you have a large working set of feature values with modest per-query storage requirements. Because DynamoDB is a managed disk-backed store, it can cost-effectively hold billions of entities without the memory pressure that would dominate a Valkey deployment. It also requires no capacity planning for replication or failover beyond the WCU/RCU dimensions.
Valkey/Redis is the better fit for ultra-low-latency workloads and for workloads where the working set is small enough to fit in memory. In-memory reads are meaningfully faster than DynamoDB’s single-digit-millisecond reads, and small Valkey deployments are typically cheaper than an equivalent DynamoDB configuration.

A common pattern is DynamoDB with an LRU cache and/or a Bloom filter to minimize cache reads.

Sizing WCU and RCU

Chalk’s DynamoDB encoding (native dtypes + short feature identifiers) keeps per-item payloads small, so RCU/WCU calculations are typically driven by query volume and the number of features read per query rather than by raw payload size.

A useful starting point:

RCU — each online query consumes roughly one RCU per entity read (entities typically fit within the 4 KiB read unit, even with dozens of features). Multiply expected QPS by the average number of entities loaded per query. Use eventually consistent reads unless you have a specific reason to pay 2x for strongly consistent reads; Chalk does not require strong consistency.
WCU — each persisted query result consumes roughly one WCU per entity written (again, entities typically fit within the 1 KiB write unit). Multiply expected QPS by the fraction of queries whose results are persisted to the online store and by the average number of entities written per query.

Chalk will assist with initial sizing based on your query mix, but the customer is ultimately responsible for choosing and tuning DynamoDB capacity: DynamoDB capacity is a direct cost driver, and the tradeoffs between provisioned, on-demand, and autoscaled capacity are workload-specific and owned by the customer.

Provisioned vs. on-demand vs. autoscaled

DynamoDB offers three capacity modes, each with different cost and operational characteristics:

Provisioned (static) — you pay for a fixed WCU/RCU amount regardless of utilization. Cheapest per unit for steady-state workloads where utilization is consistently high, but throttles immediately when traffic exceeds the provisioned level. Appropriate when you have a well-understood, relatively flat traffic pattern.
Provisioned with autoscaling — capacity tracks a target utilization (typically 70%). AWS Application Auto Scaling adjusts WCU/RCU in response to CloudWatch metrics. Scale-up is reactive (there is a lag, typically minutes) and scale-down has a cooldown, so autoscaling accommodates gradual traffic shifts well but can still throttle on sharp spikes.
On-demand — no capacity planning; you pay per request. Roughly 7x the per-unit cost of steady-state provisioned capacity, but absorbs arbitrary traffic spikes instantly. Appropriate for bursty, unpredictable traffic where throttling is not acceptable.

For most production Chalk deployments, provisioned-with-autoscaling is the right default: it amortizes the steady-state cost advantage of provisioned capacity while still absorbing diurnal traffic variation. Reserve on-demand for environments with highly unpredictable traffic or very low steady-state utilization.

Single-region vs. multi-region

Chalk supports DynamoDB online stores in either a single region or replicated across multiple regions using DynamoDB Global Tables.

Single-region

A single-region deployment is the simplest configuration: one table in one region, accessed by Chalk query servers running in the same region. If the region becomes unavailable, the online store is unavailable and online queries will fail until the region recovers. Single-region is appropriate when your application’s availability requirements do not extend beyond a single AWS region.

Multi-region

Global Tables replicate items asynchronously between regions, typically with sub-second propagation under normal conditions. Chalk recommends asynchronous replication (the default for Global Tables) rather than attempting to build strongly consistent cross-region writes: synchronous cross-region replication would require every online write to commit in at least two regions before returning, which is prohibitively expensive in both latency (adding one inter-region round trip per write) and cost.

Because replication is asynchronous, a regional failover can lose the last few seconds of writes that had not yet replicated from the lost primary region. In practice, this achieves RPO < 1 minute: the write lag for Global Tables is typically under 1 second during normal operation, and even under regional stress has historically stayed well below a minute. For Chalk online queries, the practical effect of this RPO is that a small number of the most recently persisted query results may be missing after failover, forcing re-computation on the next query; feature values themselves are not corrupted.

The tradeoff: accept a small RPO in exchange for (a) much lower write latency, (b) lower cost, and (c) a simpler operational model. Applications that cannot tolerate any lost writes must use a different persistence model than an online feature store.

See Multi-Region Failover for the Chalk-level configuration that steers query traffic to a healthy region.

Shared responsibility

Chalk will assist with DynamoDB sizing, capacity-mode selection, and replication topology, but the customer is ultimately responsible for provisioning and operating the DynamoDB table. This is intentional: DynamoDB capacity is a direct cost driver that the customer controls, and capacity decisions must be made against the customer’s own cost model and availability targets.

Chalk’s responsibilities are:

Advising on initial sizing and recommending capacity mode for a given workload
Operating the Chalk components that read from and write to DynamoDB
Reporting online store error rates, throttle rates, and latency in the Chalk UI

Customer responsibilities are:

Provisioning the DynamoDB table and any Global Table replicas
Choosing provisioned / on-demand / autoscaled capacity
Tuning autoscaling targets and floor/ceiling values
Configuring IAM access for the Chalk service account

Example Terraform: single-region

A single-region DynamoDB table with provisioned capacity:

resource "aws_dynamodb_table" "chalk_online_store" {
  name         = "chalk-online-store"
  billing_mode = "PROVISIONED"

  read_capacity  = 1000
  write_capacity = 500

  hash_key  = "pk"
  range_key = "sk"

  attribute {
    name = "pk"
    type = "S"
  }

  attribute {
    name = "sk"
    type = "S"
  }

  point_in_time_recovery {
    enabled = true
  }

  server_side_encryption {
    enabled = true
  }

  tags = {
    chalk_environment = "production"
  }
}

Switch billing_mode to PAY_PER_REQUEST and remove the read_capacity / write_capacity fields for on-demand.

Example Terraform: multi-region (Global Tables)

A multi-region deployment uses a single aws_dynamodb_table resource with replica blocks. Global Tables require stream_enabled = true and stream_view_type = "NEW_AND_OLD_IMAGES":

resource "aws_dynamodb_table" "chalk_online_store" {
  name             = "chalk-online-store"
  billing_mode     = "PROVISIONED"
  read_capacity    = 1000
  write_capacity   = 500
  hash_key         = "pk"
  range_key        = "sk"
  stream_enabled   = true
  stream_view_type = "NEW_AND_OLD_IMAGES"

  attribute {
    name = "pk"
    type = "S"
  }

  attribute {
    name = "sk"
    type = "S"
  }

  replica {
    region_name = "us-east-1"
  }

  replica {
    region_name = "us-west-2"
  }

  point_in_time_recovery {
    enabled = true
  }

  server_side_encryption {
    enabled = true
  }

  tags = {
    chalk_environment = "production"
  }
}

Each replica is a full copy of the table in the specified region; replication is asynchronous with typical lag well under a second. Provisioned capacity applies per-region and must be sized for each region’s local traffic.

Example Terraform: autoscaling policy

An autoscaling policy tracks target utilization on read and write capacity. Attach one pair of scalable targets and policies per capacity dimension:

resource "aws_appautoscaling_target" "read_target" {
  max_capacity       = 5000
  min_capacity       = 500
  resource_id        = "table/${aws_dynamodb_table.chalk_online_store.name}"
  scalable_dimension = "dynamodb:table:ReadCapacityUnits"
  service_namespace  = "dynamodb"
}

resource "aws_appautoscaling_policy" "read_policy" {
  name               = "chalk-online-store-read-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.read_target.resource_id
  scalable_dimension = aws_appautoscaling_target.read_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.read_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "DynamoDBReadCapacityUtilization"
    }
    target_value       = 70.0
    scale_in_cooldown  = 60
    scale_out_cooldown = 60
  }
}

resource "aws_appautoscaling_target" "write_target" {
  max_capacity       = 2500
  min_capacity       = 250
  resource_id        = "table/${aws_dynamodb_table.chalk_online_store.name}"
  scalable_dimension = "dynamodb:table:WriteCapacityUnits"
  service_namespace  = "dynamodb"
}

resource "aws_appautoscaling_policy" "write_policy" {
  name               = "chalk-online-store-write-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.write_target.resource_id
  scalable_dimension = aws_appautoscaling_target.write_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.write_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "DynamoDBWriteCapacityUtilization"
    }
    target_value       = 70.0
    scale_in_cooldown  = 60
    scale_out_cooldown = 60
  }
}

A 70% target utilization is a conservative starting point that leaves headroom for the reactive scale-up delay. For workloads with sharper traffic spikes, lower the target to 50-60% or raise min_capacity so that the floor already covers expected peak-to-trough variation. For Global Tables, configure autoscaling independently in each region.

Configuration

The Chalk-side DynamoDB client exposes a number of tuning knobs as environment variables. These are set on the Chalk engine and persistence-writer deployments and control client concurrency, batching, caching, retries, and consistency behavior. Most defaults are tuned for typical online-serving workloads; the settings below are documented primarily so that operators can diagnose throughput problems and adjust where the defaults do not match a specific workload.

Client threads and connections

DynamoDB requests are issued by a pool of client threads against a fixed pool of HTTP connections. Serialization and deserialization of items happens on a separate pool of serde threads so that CPU-bound encoding work does not block the I/O threads.

Name	Default	Description
`DYNAMODB_NUM_CLIENT_THREADS`	`2 * desired_cpu_parallelism`	Number of threads in the DynamoDB client pool. These threads issue and await `BatchGetItem` / `BatchWriteItem` / `TransactWriteItems` calls. Increase for read-heavy workloads that bottleneck on I/O wait.
`DYNAMODB_NUM_CLIENT_CONNECTIONS`	`2 * desired_cpu_parallelism`	Maximum number of concurrent HTTP connections to DynamoDB. Should generally be set equal to or slightly above `DYNAMODB_NUM_CLIENT_THREADS`. Each connection corresponds to a TCP/TLS session.
`DYNAMODB_NUM_SERDE_THREADS`	`desired_cpu_parallelism`	Number of threads used to encode/decode DynamoDB items. CPU-bound; increase if profiling shows serde saturation while client threads are idle.

Read batching

BatchGetItem requests are split into multiple parallel sub-batches. The configuration below controls how those sub-batches are sized.

Name	Default	Description
`DYNAMODB_GETITEM_MIN_BATCH_SIZE`	`10`	Minimum number of keys per `BatchGetItem` sub-batch. The DynamoDB protocol allows up to 100 keys per batch; empirically, batches of fewer than ~10 keys are no faster than a 10-key batch, so smaller splits only add request overhead.
`DYNAMODB_GETITEM_MIN_BATCH_CONCURRENCY`	`DYNAMODB_NUM_CLIENT_THREADS`	Maximum number of parallel sub-batches per `BatchGetItem` request. Defaults to the size of the client thread pool. Lowering this is useful when an environment receives many concurrent queries: each individual query is then satisfied with fewer (larger) batches, leaving more client threads available to other queries.

In-memory caches and Bloom filter

Chalk supports per-namespace LRU caching of feature values in front of DynamoDB, and a Bloom filter to short-circuit reads for keys known to be absent. These reduce DynamoDB RCU consumption and tail latency at the cost of memory.

Name	Default	Description
`DYNAMODB_CACHED_NAMESPACES`	`None`	JSON list of namespace cache configurations of the form `{"namespace": "...", "ttl_seconds": 86400, "max_lru_size": 10000}`. `max_lru_size` is optional; if omitted, the cache grows without bound. Use for hot namespaces where stale-by-up-to-`ttl_seconds` reads are acceptable.
`DYNAMODB_LRU_CACHE_CACHE_MISSES`	`true`	When `true`, the namespace LRU cache also caches negative results (rows that did not exist in DynamoDB). Set to `false` to re-query on every miss; useful when missing rows are expected to be created by an out-of-band writer that the engine should observe quickly.
`DYNAMODB_BLOOM_FILTER_DEBUG_MODE`	`false`	When `true`, the Bloom filter still issues the underlying DynamoDB read on a Bloom hit/miss and verifies that the Bloom filter’s prediction was consistent with the actual store. Use only for debugging false-positive/negative rates; this disables the latency benefit of the filter.

Request racing

When request racing is enabled, slow BatchGetItem calls are duplicated after a configured wait. The first response wins. This trades extra RCU consumption for better p99 read latency when DynamoDB occasionally serves a request slowly.

Request racing is one of the most effective knobs available for cutting DynamoDB tail latency. We recommend setting DYNAMODB_REQUEST_RACING_WAIT_TIME to roughly the p95 of observed DynamoDB request latency: at that threshold, only the slowest ~5% of requests are duplicated, so the additional RCU cost is small while the p99/p99.9 read tail collapses toward p95. Setting the wait time meaningfully below p95 amplifies RCU consumption without much further tail benefit; setting it above p95 leaves significant tail latency on the table.

Name	Default	Description
`DYNAMODB_ENABLE_REQUEST_RACING`	`false`	Master switch for request racing. When `true`, `DYNAMODB_REQUEST_RACING_WAIT_TIME` must also be set.
`DYNAMODB_REQUEST_RACING_WAIT_TIME`	-	Wait time in milliseconds before issuing a duplicate request. Recommended value: roughly the p95 of DynamoDB request latency for this environment. Lower values cut tail latency more aggressively but also amplify RCU consumption on every slow request.

Writes and consistency

Name	Default	Description
`DYNAMODB_CHECK_TS_FOR_BULK_WRITES`	`true`	When `true`, bulk writes use DynamoDB transactional updates that skip the write if the existing observed-at timestamp is newer than the incoming value. Prevents stale data from overwriting fresher data when writers race. Transactional writes cost 2x WCU; set to `false` if your pipeline already guarantees monotonic write ordering.
`DYNAMODB_ONLY_WRITE_NEWER_VALUES`	`true`	Conditional-update guard for the non-bulk write path. When `true`, the per-item update expression compares observed-at timestamps and skips the write if the existing value is newer. Disable only if you are certain that all writers issue strictly monotonic timestamps.
`DYNAMODB_TRANSACTION_WRITE_CONFLICT_MIN_RETRY_MILLIS`	`50`	Initial backoff (milliseconds) when a transactional write fails due to a `TransactionConflictException`. Subsequent retries scale this value with jitter.
`DYNAMODB_TRANSACTION_WRITE_CONFLICT_MAX_RETRIES`	`5`	Maximum number of retries on a transactional write conflict before surfacing the error. Increase if your workload has high contention on the same key (e.g. many writers updating the same entity).
`DYNAMODB_AGGREGATE_UPDATE_CACHE_SIZE`	`256`	In-memory cache size, in entries, for materialized aggregation buckets used to speed up updates to non-trivial aggregations such as approx-count-distinct. Tune to roughly match the number of frequently updating buckets at any given time. Monitor with the `chalk.libdynamo.num_update_cache_*` metrics.

Retries and timeouts

Name	Default	Description
`DYNAMODB_MAX_RETRIES`	`12`	Maximum number of retries on retryable DynamoDB errors (throttling, transient 5xx). Combined with `DYNAMODB_RETRY_SCALE_FACTOR`, this controls how aggressively the client absorbs throttling.
`DYNAMODB_RETRY_SCALE_FACTOR`	`10`	Multiplier applied to exponential-backoff delays. Higher values smooth out throughput during sustained throttling at the cost of higher per-request latency.
`DYNAMODB_REQUEST_TIMEOUT_MS`	`None`	Per-request timeout in milliseconds. When unset, the AWS SDK default is used. Set this if you would rather fail fast than wait on a slow region.

Initialization

Name	Default	Description
`DYNAMODB_WARMUP_FQN_MAPPING`	`true`	Chalk stores a short stable identifier for each fully-qualified feature name in DynamoDB to keep items small. When `true`, the engine pre-loads the entire FQN→short-name mapping at startup. When `false`, mappings are computed and cached lazily on first use of each feature.
`DYNAMODB_CREATE_TABLES_IF_NOT_EXISTS`	`false`	When `true`, the engine will attempt to create the DynamoDB table at startup if it does not already exist. Off by default because production tables should be provisioned via Terraform (see above) so that capacity, replication, and IAM are managed with the rest of the customer’s infrastructure.

​Overview

​DynamoDB vs. Valkey/Redis

​Sizing WCU and RCU

​Provisioned vs. on-demand vs. autoscaled

​Single-region vs. multi-region

​Single-region

​Multi-region

​Shared responsibility

​Example Terraform: single-region

​Example Terraform: multi-region (Global Tables)

​Example Terraform: autoscaling policy

​Configuration

​Client threads and connections

​Read batching

​In-memory caches and Bloom filter

​Request racing

​Writes and consistency

​Retries and timeouts

​Initialization

On this page