Infrastructure
Deploy DynamoDB as a Chalk online store in a single region or with multi-region replication.
Chalk supports DynamoDB as an online store. Online query results and cached feature values are written to DynamoDB by the background persistence writers and read directly by the query servers. This page covers how to size DynamoDB capacity, how to choose between single-region and multi-region deployments, and how to provision everything via Terraform.
Chalk’s DynamoDB online store uses a single table per environment with feature keys encoded to minimize both storage and capacity consumption: values are stored using native DynamoDB data types (not JSON-encoded strings), and feature names are compressed to short stable identifiers. This means that DynamoDB capacity sizing in practice consumes noticeably less WCU/RCU than a naive estimate based on the raw JSON size of a feature set would suggest.
Chalk supports both DynamoDB and Valkey (or Redis) as online stores. The right choice depends on your workload:
A common pattern is DynamoDB with an LRU cache and/or a Bloom filter to minimize cache reads.
Chalk’s DynamoDB encoding (native dtypes + short feature identifiers) keeps per-item payloads small, so RCU/WCU calculations are typically driven by query volume and the number of features read per query rather than by raw payload size.
A useful starting point:
Chalk will assist with initial sizing based on your query mix, but the customer is ultimately responsible for choosing and tuning DynamoDB capacity: DynamoDB capacity is a direct cost driver, and the tradeoffs between provisioned, on-demand, and autoscaled capacity are workload-specific and owned by the customer.
DynamoDB offers three capacity modes, each with different cost and operational characteristics:
For most production Chalk deployments, provisioned-with-autoscaling is the right default: it amortizes the steady-state cost advantage of provisioned capacity while still absorbing diurnal traffic variation. Reserve on-demand for environments with highly unpredictable traffic or very low steady-state utilization.
Chalk supports DynamoDB online stores in either a single region or replicated across multiple regions using DynamoDB Global Tables.
A single-region deployment is the simplest configuration: one table in one region, accessed by Chalk query servers running in the same region. If the region becomes unavailable, the online store is unavailable and online queries will fail until the region recovers. Single-region is appropriate when your application’s availability requirements do not extend beyond a single AWS region.
Global Tables replicate items asynchronously between regions, typically with sub-second propagation under normal conditions. Chalk recommends asynchronous replication (the default for Global Tables) rather than attempting to build strongly consistent cross-region writes: synchronous cross-region replication would require every online write to commit in at least two regions before returning, which is prohibitively expensive in both latency (adding one inter-region round trip per write) and cost.
Because replication is asynchronous, a regional failover can lose the last few seconds of writes that had not yet replicated from the lost primary region. In practice, this achieves RPO < 1 minute: the write lag for Global Tables is typically under 1 second during normal operation, and even under regional stress has historically stayed well below a minute. For Chalk online queries, the practical effect of this RPO is that a small number of the most recently persisted query results may be missing after failover, forcing re-computation on the next query; feature values themselves are not corrupted.
The tradeoff: accept a small RPO in exchange for (a) much lower write latency, (b) lower cost, and (c) a simpler operational model. Applications that cannot tolerate any lost writes must use a different persistence model than an online feature store.
See Multi-Region Failover for the Chalk-level configuration that steers query traffic to a healthy region.
Chalk will assist with DynamoDB sizing, capacity-mode selection, and replication topology, but the customer is ultimately responsible for provisioning and operating the DynamoDB table. This is intentional: DynamoDB capacity is a direct cost driver that the customer controls, and capacity decisions must be made against the customer’s own cost model and availability targets.
Chalk’s responsibilities are:
Customer responsibilities are:
A single-region DynamoDB table with provisioned capacity:
resource "aws_dynamodb_table" "chalk_online_store" {
name = "chalk-online-store"
billing_mode = "PROVISIONED"
read_capacity = 1000
write_capacity = 500
hash_key = "pk"
range_key = "sk"
attribute {
name = "pk"
type = "S"
}
attribute {
name = "sk"
type = "S"
}
point_in_time_recovery {
enabled = true
}
server_side_encryption {
enabled = true
}
tags = {
chalk_environment = "production"
}
}Switch billing_mode to PAY_PER_REQUEST and remove the read_capacity / write_capacity
fields for on-demand.
A multi-region deployment uses a single aws_dynamodb_table resource with replica blocks.
Global Tables require stream_enabled = true and stream_view_type = "NEW_AND_OLD_IMAGES":
resource "aws_dynamodb_table" "chalk_online_store" {
name = "chalk-online-store"
billing_mode = "PROVISIONED"
read_capacity = 1000
write_capacity = 500
hash_key = "pk"
range_key = "sk"
stream_enabled = true
stream_view_type = "NEW_AND_OLD_IMAGES"
attribute {
name = "pk"
type = "S"
}
attribute {
name = "sk"
type = "S"
}
replica {
region_name = "us-east-1"
}
replica {
region_name = "us-west-2"
}
point_in_time_recovery {
enabled = true
}
server_side_encryption {
enabled = true
}
tags = {
chalk_environment = "production"
}
}Each replica is a full copy of the table in the specified region; replication is asynchronous with typical lag well under a second. Provisioned capacity applies per-region and must be sized for each region’s local traffic.
An autoscaling policy tracks target utilization on read and write capacity. Attach one pair of scalable targets and policies per capacity dimension:
resource "aws_appautoscaling_target" "read_target" {
max_capacity = 5000
min_capacity = 500
resource_id = "table/${aws_dynamodb_table.chalk_online_store.name}"
scalable_dimension = "dynamodb:table:ReadCapacityUnits"
service_namespace = "dynamodb"
}
resource "aws_appautoscaling_policy" "read_policy" {
name = "chalk-online-store-read-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.read_target.resource_id
scalable_dimension = aws_appautoscaling_target.read_target.scalable_dimension
service_namespace = aws_appautoscaling_target.read_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "DynamoDBReadCapacityUtilization"
}
target_value = 70.0
scale_in_cooldown = 60
scale_out_cooldown = 60
}
}
resource "aws_appautoscaling_target" "write_target" {
max_capacity = 2500
min_capacity = 250
resource_id = "table/${aws_dynamodb_table.chalk_online_store.name}"
scalable_dimension = "dynamodb:table:WriteCapacityUnits"
service_namespace = "dynamodb"
}
resource "aws_appautoscaling_policy" "write_policy" {
name = "chalk-online-store-write-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.write_target.resource_id
scalable_dimension = aws_appautoscaling_target.write_target.scalable_dimension
service_namespace = aws_appautoscaling_target.write_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "DynamoDBWriteCapacityUtilization"
}
target_value = 70.0
scale_in_cooldown = 60
scale_out_cooldown = 60
}
}A 70% target utilization is a conservative starting point that leaves headroom for the
reactive scale-up delay. For workloads with sharper traffic spikes, lower the target to 50-60%
or raise min_capacity so that the floor already covers expected peak-to-trough variation.
For Global Tables, configure autoscaling independently in each region.