Overview
How it all fits together.
Chalk offers a hosted model (“Chalk Cloud”) and a customer-hosted model (“Customer Cloud”). Most companies choose to run Chalk in their own cloud using the Customer Cloud model. This page discusses the Customer Cloud deployment of Chalk on AWS and GCP.
A Chalk deployment consists of a Management Plane and a Data Plane:
The Management Plane is responsible for storing and serving non-customer data (like alert and RBAC configurations). It orchestrates machines in the Data Plane using the Kubernetes API to do things like scaling deployments, and running batch jobs.
The Management Plane does not have access to your data. Most Chalk partners using the Customer Cloud deployment choose to have Chalk host the Metadata Plane, but in some especially sensitive applications (like FedRamp deployments), Chalk partners will also opt to host the Metadata Plane. Hosting the metadata plane requires the Enterprise Features and Enterprise Support plans.
The Data Plane consists of the machines computing feature pipelines, online store, and offline store. The compute nodes run on Kubernetes (typically EKS on AWS and GKE on GCP.)
A single Data Plane can run many Chalk Environments.
Often, companies will have 2-3 environments (like qa
, stage
, and prod
.)
If running in a single data plane, these environments
share resources, which helps with cost and ease of setup.
However, if you prefer to have stronger isolation between Chalk Environments, each Chalk Environment can run in a separate Data Plane. You would typically run only one Metadata Plane to orchestrate all Data Planes, and deploy the Metadata Plane to the most sensitive of the environments.
Chalk is deployed with Terraform and uses common cloud primitives. Given all necessary permissions, Chalk can be deployed in about an hour. You can see sample Terraform for AWS here.
Our goal is to make deployments fit with your existing infrastructure. If you have custom needs, we are happy to customize the deployment to fit with your service architecture.
Let’s explore Chalk’s architecture by examining how the pieces work together to compute & serve Features online. Suppose that you want to compute a set of features for making a decision about a request from a user:
Chalk’s online query serving platform is designed to fetch data from a variety of heterogeneous data sources and execute complex transformations on that data with the minimum possible latency. Chalk uses many techniques to reduce latency, such as:
Chalk’s architecture also supports efficient batch point-in-time queries to construct model training sets or perform batch offline inference.
Chalk’s Offline Storage is optimized for batch querying of temporally consistent data. Chalk uses columnar storage backends (Snowflake, Delta Lake, or BigQuery) to ingest massive amounts of data from a variety of data sources and query it efficiently. Note that data ingested into the Offline Store can be trivially made available for use in an online querying context using Reverse ETL.
You can see a list of supported ingestion sources in the Integrations section of these docs.
Chalk uses different storage technologies to support online and offline use cases.
The online store is optimized for serving the latest version of any given feature for any given entity with the minimum possible latency. Behind the scenes, Chalk uses key-value stores for this purpose. Chalk can be configured to use Redis or Cloud Memory Store for smaller resident data sets with high latency requirements, or DynamoDB when horizontal scalability is required.
The offline store is optimized for storing all historical feature values, serving point-in-time correct queries, and tracking provenance of features. Chalk supports a variety of storage backends depending on data scale and latency requirements. Typically, Chalk uses Snowflake, Delta Lake, or BigQuery.
Chalk supports not only robust monitoring of pipeline execution, but of the feature values themselves as well. Monitoring machine learning data infrastructure is just as important as monitoring application availability, but is often overlooked.
Each time a query is served, Chalk assigns a unique “trace id” for the request. Chalk tracks all emitted logs on both a per-resolver basis and a per-trace basis. This enables you to debug problems and track fine-grained performance metrics pertaining to specific features and resolvers. Leveraging this helps answer common questions such as how often certain features are computed and how long the computation takes.
Like many traditional application monitoring platforms, Chalk supports alerting on performance or availability issues via integrations with PagerDuty and Slack.
In addition to performance and request metrics for computation, Chalk supports alerting on feature values themselves. You can specify a variety of threshold requirements and drift tolerance tests to help spot issues such as: