Chalk helps you build out feature pipelines for training and serving machine learning models.
The building blocks of Chalk are features. Each piece of data in your system, whether a column in a database or a value passed in at inference, is a feature. For example, a user’s age and whether they are an adult might be a features in your system:
from chalk.features import features @features class User: id: int age: int is_adult: bool
Features are computed by resolvers. A resolver is a function that takes features as arguments and outputs new features. For example, a resolver might take a user’s age and output a boolean indicating whether they are over 18.
from chalk.features import online @online def is_adult(age: User.age) -> User.is_adult: return age >= 18
The focus on data instead of pipelines may be unfamiliar at first. Traditional orchestration platforms like Airflow or Dagster explicitly compose functions which produce data into a DAG of tasks. With Chalk, the DAG of resolvers is defined implicitly by the features they produce. This architecture makes it easy to build out feature pipelines that are reusable and composable. Chalk handles tracking your features for temporal consistency, running your resolvers in parallel, and horizontally scaling your feature pipelines.
This tutorial will walk you through the process of building a feature pipeline for a simple model. We will be building a feature pipeline for a fraud detection model, and will cover the full feature development lifecycle:
Before you get started, make sure you have the Chalk CLI installed.
If you want to skip ahead, you can find the full source code for this tutorial on GitHub.