Chalk home page
Docs
SDK
CLI
  1. Resolvers
  2. Overview

While feature classes define what your data looks like (e.g. the types of your features, their relationships, and their properties), resolvers define how your feature values are computed.

If you haven't read the section on feature classes, you are strongly encouraged to do so. Understanding feature classes will help you understand resolvers.

Resolvers have four main components: inputs, outputs, code, and run conditions.

The first three of these are rather straightforward. The inputs to a resolver are the features it uses to compute its output. The outputs of a resolver are the features that it computes. The code of the resolver is the logic that transforms your input features into your output features.

The last component, run conditions, is a little more nuanced. At a high level, Chalk is optimized for two different access patterns: online and offline. In online access, information about a single entity is being requested and features must be computed quickly and cached. In offline access, a large chunk of historical feature data is being requested (you can think of this as creating a dataset for model training or running an analytic query).

Resolvers are always specified as either online or offline. We go into more detail about what this means in the next section where we talk about run conditions, but for now it is important to note that: 1). every resolver is either online or offline, and 2). different resolvers run depending on how you are requesting data from Chalk.

Writing Resolvers

Chalk allows you to write resolvers in three different ways:

  • as SQL queries, with comments specifying data sources and output features.
  • as python functions, with type annotations specifying the input and output features.
  • as an expression, inline in your feature definitions.

Below is an example of how to write the three kinds of resolvers in Chalk for a User feature class. In this example, we assume that a Postgres data source called pg_users has been configured and connected to Chalk.

First, let’s define our User feature class:

src/user.py
from chalk.features import features

@features
class User:
  id: str
  name: str
  age: int
  hobbies: list[str]

  is_runner: bool  # computed feature
  athleticism_score: float # computed feature

Now we define our resolvers:

We resolve the id, name, age, and hobbies features for our users from the users table of our pg_users data source. When reading data from your data sources, you should use a SQL resolver:

src/get_user.chalk.sql
-- The features given to us by the user.
-- resolves: user
-- type: online
-- source: pg_users
select id, name, age, hobbies
from users;

To compute the is_runner feature for users, we simply check whether running is in the list of hobbies, which we can do with an expression. Chalk expressions utilize low-latency C++ for efficient execution. Chalk expressions can use any of the chalk.functions to express a variety of computations.

src/user.py
import chalk.functions as F
from chalk.features import features, _

@features
class User:
  id: str
  name: str
  age: int
  hobbies: list[str]

  is_runner: bool = F.contains(_.hobbies, "running")
  athleticism_score: float # computed feature

We now use a python resolver to compute the athleticism_socre feature for our users. Python resolvers are useful for integrating existing libraries, and executing more complex logic, such as API calls.

src/resolvers.py
from chalk import online
from src.user import User
from utils.athleticism import is_athletic

@online
def get_is_runner(
    hobbies: User.hobbies
) -> User.athleticism_score:
    num_athletic_hobbies = sum([is_athletic(hobby) for hobby in hobbies])
    return num_athletic_hobbies / len(hobbies)

When Do Resolvers Run

A question that often crops up at this point is, “I have defined a resolver, but how do I make it run?”

Just like features, the idea behind resolvers is that they are declarative. Resolvers don’t run when you define them, and they don’t run exhaustively over your data. They run only in response to a request for the features that they are responsible for computing. In practice, this means that they run in response to “queries”.

Resolvers can also be run on a schedule or triggered. Either approach causes them to 1). execute, in batch-job fashion, across all data they find in your upstream data sources, and 2). write computed features to your offline store.

A query is just a request for data. Chalk takes a query and determines which resolvers it needs to run, optimizes the execution order, and provides you with a response. For now, we’ll hold off on discussing queries further, but feel free to skip ahead and get a sense of what Chalk queries are and how they work before diving deeper into resolvers.