Chalk home page
Docs
API
CLI
  1. Fraud Detection on Chalk Tutorial

Introduction

Chalk helps you build out feature pipelines for training and serving machine learning models.

The building blocks of Chalk are features. Each piece of data in your system, whether a column in a database or a value passed in at inference, is a feature. For example, a user’s age and whether they are an adult might be a features in your system:

from chalk.features import features

@features
class User:
    id: int
    age: int
    is_adult: bool

Features are computed by resolvers. A resolver is a function that takes features as arguments and outputs new features. For example, a resolver might take a user’s age and output a boolean indicating whether they are over 18.

from chalk.features import online

@online
def is_adult(age: User.age) -> User.is_adult:
    return age >= 18

The focus on data instead of pipelines may be unfamiliar at first. Traditional orchestration platforms like Airflow or Dagster explicitly compose functions which produce data into a DAG of tasks. With Chalk, the DAG of resolvers is defined implicitly by the features they produce. This architecture makes it easy to build out feature pipelines that are reusable and composable. Chalk handles tracking your features for temporal consistency, running your resolvers in parallel, and horizontally scaling your feature pipelines.

This tutorial will walk you through the process of building a feature pipeline for a simple model. We will be building a feature pipeline for a fraud detection model, and will cover the full feature development lifecycle:

  1. Data Modeling - Creating feature classes for the data we want to compute.
  2. SQL Resolvers - Mapping data from SQL sources to feature classes.
  3. Python Resolvers - Defining resolvers in Python that call APIs and compute derived features.
  4. Inference - Integrating Chalk into production decisioning systems.
  5. Backtesting - Experimenting with new features

Before you get started, make sure you have the Chalk CLI installed.

If you want to skip ahead, you can find the full source code for this tutorial on GitHub.