# DynamoDB Online Store Deployment
source: https://docs.chalk.ai/docs/dynamodb-deployment

## Deploy DynamoDB as a Chalk online store in a single region or with multi-region replication.

### Overview

Chalk supports DynamoDB as an online store. Online query results
and cached feature values are written to DynamoDB by the background persistence writers
and read directly by the query servers. This page covers how to size DynamoDB capacity, how to choose
between single-region and multi-region deployments, and how to provision everything via Terraform.

Chalk's DynamoDB online store uses a single table per environment with feature keys encoded to minimize
both storage and capacity consumption: values are stored using native DynamoDB data types (not
JSON-encoded strings), and feature names are compressed to short stable identifiers. This means that
DynamoDB capacity sizing in practice consumes noticeably less WCU/RCU than a naive estimate based
on the raw JSON size of a feature set would suggest.

### DynamoDB vs. Valkey/Redis

Chalk supports both DynamoDB and Valkey (or Redis) as online stores. The right choice depends on
your workload:

- DynamoDB is the better fit when you have a large working set of feature values with
modest per-query storage requirements. Because DynamoDB is a managed disk-backed store,
it can cost-effectively hold billions of entities without the memory pressure that would
dominate a Valkey deployment. It also requires no capacity planning for replication or
failover beyond the WCU/RCU dimensions.
- Valkey/Redis is the better fit for ultra-low-latency workloads and for workloads where
the working set is small enough to fit in memory. In-memory reads are meaningfully faster
than DynamoDB's single-digit-millisecond reads, and small Valkey deployments are typically
cheaper than an equivalent DynamoDB configuration.

A common pattern is DynamoDB with an LRU cache and/or a Bloom filter to minimize cache reads.

### Sizing WCU and RCU

Chalk's DynamoDB encoding (native dtypes + short feature identifiers) keeps per-item payloads
small, so RCU/WCU calculations are typically driven by query volume and the number of features
read per query rather than by raw payload size.

A useful starting point:

- RCU -- each online query consumes roughly one RCU per entity read (entities typically fit
within the 4 KiB read unit, even with dozens of features). Multiply expected QPS by the
average number of entities loaded per query. Use eventually consistent reads unless you have
a specific reason to pay 2x for strongly consistent reads; Chalk does not require strong
consistency.
- WCU -- each persisted query result consumes roughly one WCU per entity written (again,
entities typically fit within the 1 KiB write unit). Multiply expected QPS by the fraction of
queries whose results are persisted to the online store and by the average number of entities
written per query.

Chalk will assist with initial sizing based on your query mix, but the customer is ultimately
responsible for choosing and tuning DynamoDB capacity: DynamoDB capacity is a direct cost driver,
and the tradeoffs between provisioned, on-demand, and autoscaled capacity are
workload-specific and owned by the customer.

### Provisioned vs. on-demand vs. autoscaled

DynamoDB offers three capacity modes, each with different cost and operational characteristics:

- Provisioned (static) -- you pay for a fixed WCU/RCU amount regardless of utilization.
Cheapest per unit for steady-state workloads where utilization is consistently high, but
throttles immediately when traffic exceeds the provisioned level. Appropriate when you have
a well-understood, relatively flat traffic pattern.
- Provisioned with autoscaling -- capacity tracks a target utilization (typically 70%).
AWS Application Auto Scaling adjusts WCU/RCU in response to CloudWatch metrics. Scale-up is
reactive (there is a lag, typically minutes) and scale-down has a cooldown, so autoscaling
accommodates gradual traffic shifts well but can still throttle on sharp spikes.
- On-demand -- no capacity planning; you pay per request. Roughly 7x the per-unit cost of
steady-state provisioned capacity, but absorbs arbitrary traffic spikes instantly. Appropriate
for bursty, unpredictable traffic where throttling is not acceptable.

For most production Chalk deployments, provisioned-with-autoscaling is the right default: it
amortizes the steady-state cost advantage of provisioned capacity while still absorbing
diurnal traffic variation. Reserve on-demand for environments with highly unpredictable traffic
or very low steady-state utilization.

### Single-region vs. multi-region

Chalk supports DynamoDB online stores in either a single region or replicated across multiple
regions using DynamoDB Global Tables.

### Single-region

A single-region deployment is the simplest configuration: one table in one region, accessed by
Chalk query servers running in the same region. If the region becomes unavailable, the online
store is unavailable and online queries will fail until the region recovers. Single-region is
appropriate when your application's availability requirements do not extend beyond a single
AWS region.

### Multi-region

Global Tables replicate items asynchronously between regions, typically with sub-second
propagation under normal conditions. Chalk recommends asynchronous replication (the default
for Global Tables) rather than attempting to build strongly consistent cross-region writes:
synchronous cross-region replication would require every online write to commit in at least
two regions before returning, which is prohibitively expensive in both latency (adding one
inter-region round trip per write) and cost.

Because replication is asynchronous, a regional failover can lose the last few seconds of
writes that had not yet replicated from the lost primary region. In practice, this achieves
RPO < 1 minute: the write lag for Global Tables is typically under 1 second during normal
operation, and even under regional stress has historically stayed well below a minute. For
Chalk online queries, the practical effect of this RPO is that a small number of the most
recently persisted query results may be missing after failover, forcing re-computation on the
next query; feature values themselves are not corrupted.

The tradeoff: accept a small RPO in exchange for (a) much lower write latency, (b) lower cost,
and (c) a simpler operational model. Applications that cannot tolerate any lost writes must use
a different persistence model than an online feature store.

See Multi-Region Failover for the Chalk-level configuration
that steers query traffic to a healthy region.

### Shared responsibility

Chalk will assist with DynamoDB sizing, capacity-mode selection, and replication topology, but
the customer is ultimately responsible for provisioning and operating the DynamoDB table.
This is intentional: DynamoDB capacity is a direct cost driver that the customer controls, and
capacity decisions must be made against the customer's own cost model and availability targets.

Chalk's responsibilities are:

- Advising on initial sizing and recommending capacity mode for a given workload
- Operating the Chalk components that read from and write to DynamoDB
- Reporting online store error rates, throttle rates, and latency in the Chalk UI

Customer responsibilities are:

- Provisioning the DynamoDB table and any Global Table replicas
- Choosing provisioned / on-demand / autoscaled capacity
- Tuning autoscaling targets and floor/ceiling values
- Configuring IAM access for the Chalk service account

### Example Terraform: single-region

A single-region DynamoDB table with provisioned capacity:

```
resource "aws_dynamodb_table" "chalk_online_store" {
  name         = "chalk-online-store"
  billing_mode = "PROVISIONED"

  read_capacity  = 1000
  write_capacity = 500

  hash_key  = "pk"
  range_key = "sk"

  attribute {
    name = "pk"
    type = "S"
  }

  attribute {
    name = "sk"
    type = "S"
  }

  point_in_time_recovery {
    enabled = true
  }

  server_side_encryption {
    enabled = true
  }

  tags = {
    chalk_environment = "production"
  }
}
```

Switch billing_mode to PAY_PER_REQUEST and remove the read_capacity / write_capacity
fields for on-demand.

### Example Terraform: multi-region (Global Tables)

A multi-region deployment uses a single aws_dynamodb_table resource with replica blocks.
Global Tables require stream_enabled = true and stream_view_type = "NEW_AND_OLD_IMAGES":

```
resource "aws_dynamodb_table" "chalk_online_store" {
  name             = "chalk-online-store"
  billing_mode     = "PROVISIONED"
  read_capacity    = 1000
  write_capacity   = 500
  hash_key         = "pk"
  range_key        = "sk"
  stream_enabled   = true
  stream_view_type = "NEW_AND_OLD_IMAGES"

  attribute {
    name = "pk"
    type = "S"
  }

  attribute {
    name = "sk"
    type = "S"
  }

  replica {
    region_name = "us-east-1"
  }

  replica {
    region_name = "us-west-2"
  }

  point_in_time_recovery {
    enabled = true
  }

  server_side_encryption {
    enabled = true
  }

  tags = {
    chalk_environment = "production"
  }
}
```

Each replica is a full copy of the table in the specified region; replication is asynchronous
with typical lag well under a second. Provisioned capacity applies per-region and must be
sized for each region's local traffic.

### Example Terraform: autoscaling policy

An autoscaling policy tracks target utilization on read and write capacity. Attach one pair
of scalable targets and policies per capacity dimension:

```
resource "aws_appautoscaling_target" "read_target" {
  max_capacity       = 5000
  min_capacity       = 500
  resource_id        = "table/${aws_dynamodb_table.chalk_online_store.name}"
  scalable_dimension = "dynamodb:table:ReadCapacityUnits"
  service_namespace  = "dynamodb"
}

resource "aws_appautoscaling_policy" "read_policy" {
  name               = "chalk-online-store-read-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.read_target.resource_id
  scalable_dimension = aws_appautoscaling_target.read_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.read_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "DynamoDBReadCapacityUtilization"
    }
    target_value       = 70.0
    scale_in_cooldown  = 60
    scale_out_cooldown = 60
  }
}

resource "aws_appautoscaling_target" "write_target" {
  max_capacity       = 2500
  min_capacity       = 250
  resource_id        = "table/${aws_dynamodb_table.chalk_online_store.name}"
  scalable_dimension = "dynamodb:table:WriteCapacityUnits"
  service_namespace  = "dynamodb"
}

resource "aws_appautoscaling_policy" "write_policy" {
  name               = "chalk-online-store-write-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.write_target.resource_id
  scalable_dimension = aws_appautoscaling_target.write_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.write_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "DynamoDBWriteCapacityUtilization"
    }
    target_value       = 70.0
    scale_in_cooldown  = 60
    scale_out_cooldown = 60
  }
}
```

A 70% target utilization is a conservative starting point that leaves headroom for the
reactive scale-up delay. For workloads with sharper traffic spikes, lower the target to 50-60%
or raise min_capacity so that the floor already covers expected peak-to-trough variation.
For Global Tables, configure autoscaling independently in each region.

### Configuration

The Chalk-side DynamoDB client exposes a number of tuning knobs as environment variables.
These are set on the Chalk engine and persistence-writer deployments and control client
concurrency, batching, caching, retries, and consistency behavior. Most defaults are tuned
for typical online-serving workloads; the settings below are documented primarily so that
operators can diagnose throughput problems and adjust where the defaults do not match a
specific workload.

### Client threads and connections

DynamoDB requests are issued by a pool of client threads against a fixed pool of HTTP
connections. Serialization and deserialization of items happens on a separate pool of
serde threads so that CPU-bound encoding work does not block the I/O threads.

| Name                              | Default                       | Description                                                                                                                                                                                                 |
| --------------------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DYNAMODB_NUM_CLIENT_THREADS`     | `2 * desired_cpu_parallelism` | Number of threads in the DynamoDB client pool. These threads issue and await `BatchGetItem` / `BatchWriteItem` / `TransactWriteItems` calls. Increase for read-heavy workloads that bottleneck on I/O wait. |
| `DYNAMODB_NUM_CLIENT_CONNECTIONS` | `2 * desired_cpu_parallelism` | Maximum number of concurrent HTTP connections to DynamoDB. Should generally be set equal to or slightly above `DYNAMODB_NUM_CLIENT_THREADS`. Each connection corresponds to a TCP/TLS session.              |
| `DYNAMODB_NUM_SERDE_THREADS`      | `desired_cpu_parallelism`     | Number of threads used to encode/decode DynamoDB items. CPU-bound; increase if profiling shows serde saturation while client threads are idle.                                                              |

### Read batching

BatchGetItem requests are split into multiple parallel sub-batches. The configuration
below controls how those sub-batches are sized.

| Name                                     | Default                       | Description                                                                                                                                                                                                                                                                                                                   |
| ---------------------------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DYNAMODB_GETITEM_MIN_BATCH_SIZE`        | `10`                          | Minimum number of keys per `BatchGetItem` sub-batch. The DynamoDB protocol allows up to 100 keys per batch; empirically, batches of fewer than ~10 keys are no faster than a 10-key batch, so smaller splits only add request overhead.                                                                                       |
| `DYNAMODB_GETITEM_MIN_BATCH_CONCURRENCY` | `DYNAMODB_NUM_CLIENT_THREADS` | Maximum number of parallel sub-batches per `BatchGetItem` request. Defaults to the size of the client thread pool. Lowering this is useful when an environment receives many concurrent queries: each individual query is then satisfied with fewer (larger) batches, leaving more client threads available to other queries. |

### In-memory caches and Bloom filter

Chalk supports per-namespace LRU caching of feature values in front of DynamoDB, and a Bloom
filter to short-circuit reads for keys known to be absent. These reduce DynamoDB RCU
consumption and tail latency at the cost of memory.

| Name                               | Default | Description                                                                                                                                                                                                                                                                              |
| ---------------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DYNAMODB_CACHED_NAMESPACES`       | `None`  | JSON list of namespace cache configurations of the form `{"namespace": "...", "ttl_seconds": 86400, "max_lru_size": 10000}`. `max_lru_size` is optional; if omitted, the cache grows without bound. Use for hot namespaces where stale-by-up-to-`ttl_seconds` reads are acceptable.      |
| `DYNAMODB_LRU_CACHE_CACHE_MISSES`  | `true`  | When `true`, the namespace LRU cache also caches negative results (rows that did not exist in DynamoDB). Set to `false` to re-query on every miss; useful when missing rows are expected to be created by an out-of-band writer that the engine should observe quickly.                  |
| `DYNAMODB_BLOOM_FILTER_DEBUG_MODE` | `false` | When `true`, the Bloom filter still issues the underlying DynamoDB read on a Bloom hit/miss and verifies that the Bloom filter's prediction was consistent with the actual store. Use only for debugging false-positive/negative rates; this disables the latency benefit of the filter. |

### Request racing

When request racing is enabled, slow BatchGetItem calls are duplicated after a configured
wait. The first response wins. This trades extra RCU consumption for better p99 read latency
when DynamoDB occasionally serves a request slowly.

Request racing is one of the most effective knobs available for cutting DynamoDB tail
latency. We recommend setting DYNAMODB_REQUEST_RACING_WAIT_TIME to roughly the p95 of
observed DynamoDB request latency: at that threshold, only the slowest ~5% of requests are
duplicated, so the additional RCU cost is small while the p99/p99.9 read tail collapses
toward p95. Setting the wait time meaningfully below p95 amplifies RCU consumption without
much further tail benefit; setting it above p95 leaves significant tail latency on the table.

| Name                                | Default | Description                                                                                                                                                                                                                                                |
| ----------------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DYNAMODB_ENABLE_REQUEST_RACING`    | `false` | Master switch for request racing. When `true`, `DYNAMODB_REQUEST_RACING_WAIT_TIME` must also be set.                                                                                                                                                       |
| `DYNAMODB_REQUEST_RACING_WAIT_TIME` | -       | Wait time in milliseconds before issuing a duplicate request. Recommended value: roughly the p95 of DynamoDB request latency for this environment. Lower values cut tail latency more aggressively but also amplify RCU consumption on every slow request. |

### Writes and consistency

| Name                                                   | Default | Description                                                                                                                                                                                                                                                                                                                                 |
| ------------------------------------------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DYNAMODB_CHECK_TS_FOR_BULK_WRITES`                    | `true`  | When `true`, bulk writes use DynamoDB transactional updates that skip the write if the existing observed-at timestamp is newer than the incoming value. Prevents stale data from overwriting fresher data when writers race. Transactional writes cost 2x WCU; set to `false` if your pipeline already guarantees monotonic write ordering. |
| `DYNAMODB_ONLY_WRITE_NEWER_VALUES`                     | `true`  | Conditional-update guard for the non-bulk write path. When `true`, the per-item update expression compares observed-at timestamps and skips the write if the existing value is newer. Disable only if you are certain that all writers issue strictly monotonic timestamps.                                                                 |
| `DYNAMODB_TRANSACTION_WRITE_CONFLICT_MIN_RETRY_MILLIS` | `50`    | Initial backoff (milliseconds) when a transactional write fails due to a `TransactionConflictException`. Subsequent retries scale this value with jitter.                                                                                                                                                                                   |
| `DYNAMODB_TRANSACTION_WRITE_CONFLICT_MAX_RETRIES`      | `5`     | Maximum number of retries on a transactional write conflict before surfacing the error. Increase if your workload has high contention on the same key (e.g. many writers updating the same entity).                                                                                                                                         |
| `DYNAMODB_AGGREGATE_UPDATE_CACHE_SIZE`                 | `256`   | In-memory cache size, in entries, for materialized aggregation buckets used to speed up updates to non-trivial aggregations such as approx-count-distinct. Tune to roughly match the number of frequently updating buckets at any given time. Monitor with the `chalk.libdynamo.num_update_cache_*` metrics.                                |

### Retries and timeouts

| Name                          | Default | Description                                                                                                                                                                                    |
| ----------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DYNAMODB_MAX_RETRIES`        | `12`    | Maximum number of retries on retryable DynamoDB errors (throttling, transient 5xx). Combined with `DYNAMODB_RETRY_SCALE_FACTOR`, this controls how aggressively the client absorbs throttling. |
| `DYNAMODB_RETRY_SCALE_FACTOR` | `10`    | Multiplier applied to exponential-backoff delays. Higher values smooth out throughput during sustained throttling at the cost of higher per-request latency.                                   |
| `DYNAMODB_REQUEST_TIMEOUT_MS` | `None`  | Per-request timeout in milliseconds. When unset, the AWS SDK default is used. Set this if you would rather fail fast than wait on a slow region.                                               |

### Initialization

| Name                                   | Default | Description                                                                                                                                                                                                                                                                                            |
| -------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `DYNAMODB_WARMUP_FQN_MAPPING`          | `true`  | Chalk stores a short stable identifier for each fully-qualified feature name in DynamoDB to keep items small. When `true`, the engine pre-loads the entire FQN→short-name mapping at startup. When `false`, mappings are computed and cached lazily on first use of each feature.                      |
| `DYNAMODB_CREATE_TABLES_IF_NOT_EXISTS` | `false` | When `true`, the engine will attempt to create the DynamoDB table at startup if it does not already exist. Off by default because production tables should be provisioned via Terraform (see above) so that capacity, replication, and IAM are managed with the rest of the customer's infrastructure. |