When setting up a new environment, Chalk provides several options for choosing a storage provider for your online and offline store. The DBMS (Database Management System) that you choose will depend on your specific requirements, as well as your cloud deployment provider.


Online Store Options

The Chalk online store is used for low-latency serving of real-time feature values. Although Chalk’s query engine is optimized for efficient computation of feature values by compiling Python and SQL resolvers into Rust and C++, caching computed values in the online store can further improve performance.

GCP Online Store Options

There are two options for online stores for customers on GCP (Google Cloud Platform):

StoreDescriptionPerformanceScaling
Memorystore for ValkeyValkey (in-memory key-value NoSQL store)microsecondsvertical
BigtableDistributed wide-column NoSQL databasemillisecondshorizontal

Generally, customers will choose the data store that they already use within their data platform. However, we would generally recommend Memorystore for Valkey, as it is optimized for low-latency and high throughput, and Bigtable for huge storage needs.

AWS

There are two options for customers on AWS (Amazon Web Services):

StoreDescriptionPerformanceScaling
ElastiCache for ValkeyRedis (in-memory store)microsecondsvertical
DynamoDBKey-Value NoSQL databasemillisecondsauto-scales

We generally recommend DynamoDB for its performance optimizations and ElastiCache for Valkey for its low-latency and greater storage capacity for large rows. However, some customers may choose to use RDS depending on their existing data platform.

Azure

For customers on Azure, we offer Azure Cache for Redis for the online store.

StoreDescriptionPerformanceScaling
Azure Cache for RedisRedis (key-value in-memory store)microsecondshorizontal

We generally recommend the Azure Cache for Redis to customers on Azure for a low-latency and scalable database with storage based pricing.


Offline Store Options

The Chalk offline store is used for storing historical feature values. The offline store also serves data for offline queries, which can be used for analytics, batch processing, and other use cases. We offer four options for offline stores:

StoreStorage
Google BigQueryColumnar (Bigtable-based)
Amazon RedshiftColumnar (Parquet)
SnowflakeColumnar (Micro-partitions)
Databricks Delta LakeColumnar (Parquet)
IcebergColumnar (Parquet)

For customers with GCP cloud deployments, we generally recommend Google Big Query for its performance with analytical queries. However, generally, customers will choose the data store that they already use within their data platform.