Chalk home page
Docs
API
CLI
  1. Resolvers
  2. Overview

While feature sets define what your data looks like (e.g. the types of your features, their relationships, and their properties), resolvers define how your feature values are computed.

If you haven't read the section on feature sets, you are strongly encourage to do so. Understanding feature sets will help you understand resolvers.

Resolvers have four main components: inputs, outputs, code, and run conditions.

The first three of these are rather straightforward. The inputs to a resolver are the features it uses to compute its output. The outputs of a resolver are the features that it computes. The code of the resolver is the logic that transforms your input features into your output features.

The last component, run conditions, is a little more nuanced. At a high level, Chalk is optimized for two different access patterns: online and offline. In online access, information about a single entity is being requested and features must be computed quickly and cached. In offline access, a large chunk of historical feature data is being requested (you can think of this as creating a dataset for model training or running an analytic query).

Resolvers are always specified as either online or offline. We go into more detail about what this means in the next section where we talk about run conditions, but for now it is important to note that: 1). every resolver is either online or offline, and 2). different resolvers run depending on how you are requesting data from Chalk.

Writing Resolvers

Chalk allows you to write resolvers in three different ways:

  • as SQL queries, with comments specifying data sources and output features,
  • as python functions, with type annotations specifying the input and output features,
  • inline in your feature definitions, with the underscore module.

You will likely write mostly python and SQL resolvers, so most of our examples will be for these two variants. However, underscore resolvers are a succinct and optimized way to write resolvers, which can be quite practical in certain cases. We will cover them at the end of the resolver section.

A Quick Example

Below, we provide an example of how to write SQL and python resolvers in Chalk for a User feature set. In this example, we assume that a Postgres data source called pg_users has been configured and connected to Chalk.

First, let’s define our User feature set:

src/user.py
from chalk.features import features

@features
class User:
  id: str
  name: str
  age: int
  hobbies: list[str]

  is_runner: bool  # computed feature

Now we define our resolvers:

We resolve the id, name, age, and hobbies features for our users from the users table of our pg_users data source. When reading data from your data sources, you should use a SQL resolver:

src/get_user.chalk.sql
-- The features given to us by the user.
-- resolves: user
-- type: online
-- source: pg_users
select id, name, age, hobbies
from users;

We now use a python resolver to compute the is_runner feature for our users. The get_is_runner resolver takes the hobbies feature as an input and computes a new feature: is_runner.

from chalk import online

@online
def get_is_runner(
    name: User.hobbies
) -> User.account_name_match_score:
    return "runner" in hobbies

When Do Resolvers Run

A question that often crops up at this point, is: “I have defined a resolver, but how do I make it run?”

Just like features, the idea behind resolvers is that they are declarative. Resolvers don’t run when you define them and they don’t run exhaustively over your data. They run only in response to a request for the features that they are responsible for computing. In practice, this means that they run in response to “queries”.

Resolvers can also be run on a schedule or triggered. Either approach causes them to: 1). excecute, in batch-job fashion, over all the data they find in your upstream data sources, and 2). write computed features to your offline store.

A query is just a request for data. Chalk takes a query and determines which resolvers it needs to run, optimizes the execution order, and provides you with a response. For now, we’ll hold off on discussing queries further, but feel free to skip ahead and get a sense of what Chalk queries are and how they work, before diving deeper into resolvers.