Selecting an online and offline store for your environment
When setting up a new environment, Chalk provides several options for choosing a storage provider for your online and offline store. The DBMS (Database Management System) that you choose will depend on your specific requirements, as well as your cloud deployment provider.
The Chalk online store is used for low-latency serving of real-time feature values. Although Chalk’s query engine is optimized for efficient computation of feature values by compiling Python and SQL resolvers into Rust and C++, caching computed values in the online store can further improve performance.
There are two options for online stores for customers on GCP (Google Cloud Platform):
| Store | Description | Performance | Scaling |
|---|---|---|---|
| Memorystore for Valkey | Valkey (in-memory key-value NoSQL store) | microseconds | vertical |
| Bigtable | Distributed wide-column NoSQL database | milliseconds | horizontal |
Generally, customers will choose the data store that they already use within their data platform. However, we would generally recommend Memorystore for Valkey, as it is optimized for low-latency and high throughput, and Bigtable for huge storage needs.
There are two options for customers on AWS (Amazon Web Services):
| Store | Description | Performance | Scaling |
|---|---|---|---|
| ElastiCache for Valkey | Redis (in-memory store) | microseconds | vertical |
| DynamoDB | Key-Value NoSQL database | milliseconds | auto-scales |
We generally recommend DynamoDB for its performance optimizations and ElastiCache for Valkey for its low-latency and greater storage capacity for large rows. However, some customers may choose to use RDS depending on their existing data platform.
For customers on Azure, we offer Azure Cache for Redis for the online store.
| Store | Description | Performance | Scaling |
|---|---|---|---|
| Azure Cache for Redis | Redis (key-value in-memory store) | microseconds | horizontal |
We generally recommend the Azure Cache for Redis to customers on Azure for a low-latency and scalable database with storage based pricing.
The Chalk offline store is used for storing historical feature values. The offline store also serves data for offline queries, which can be used for analytics, batch processing, and other use cases. We offer four options for offline stores:
| Store | Storage |
|---|---|
| Google BigQuery | Columnar (Bigtable-based) |
| Amazon Redshift | Columnar (Parquet) |
| Snowflake | Columnar (Micro-partitions) |
| Databricks Delta Lake | Columnar (Parquet) |
| Iceberg | Columnar (Parquet) |
For customers with GCP cloud deployments, we generally recommend Google Big Query for its performance with analytical queries. However, generally, customers will choose the data store that they already use within their data platform.