Resolvers
Create resolvers to compute feature values.
While feature classes define what your data looks like (e.g. the types of your features, their relationships, and their properties), resolvers define how your feature values are computed.
If you haven't read the section on feature classes, you are strongly encourage to do so. Understanding feature classes will help you understand resolvers.
Resolvers have four main components: inputs, outputs, code, and run conditions.
The first three of these are rather straightforward. The inputs to a resolver are the features it uses to compute its output. The outputs of a resolver are the features that it computes. The code of the resolver is the logic that transforms your input features into your output features.
The last component, run conditions, is a little more nuanced. At a high level, Chalk is optimized for two different access patterns: online and offline. In online access, information about a single entity is being requested and features must be computed quickly and cached. In offline access, a large chunk of historical feature data is being requested (you can think of this as creating a dataset for model training or running an analytic query).
Resolvers are always specified as either online or offline. We go into more detail about what this means in the next section where we talk about run conditions, but for now it is important to note that: 1). every resolver is either online or offline, and 2). different resolvers run depending on how you are requesting data from Chalk.
Chalk allows you to write resolvers in three different ways:
You will likely write mostly python and SQL resolvers, so most of our examples will be for these two variants. However, underscore resolvers are a succinct and optimized way to write resolvers, which can be quite practical in certain cases. We will cover them at the end of the resolver section.
Below, we provide an example of how to write SQL and python resolvers in Chalk
for a User
feature class. In this example, we assume that a Postgres data source
called pg_users
has been configured and connected to Chalk.
First, let’s define our User
feature class:
from chalk.features import features
@features
class User:
id: str
name: str
age: int
hobbies: list[str]
is_runner: bool # computed feature
Now we define our resolvers:
We resolve the id
, name
, age
, and hobbies
features for our users from the
users
table of our pg_users
data source. When reading data from your data
sources, you should use a SQL resolver:
-- The features given to us by the user.
-- resolves: user
-- type: online
-- source: pg_users
select id, name, age, hobbies
from users;
We now use a python resolver to compute the is_runner
feature for our users.
The get_is_runner
resolver takes the hobbies
feature as an input and computes
a new feature: is_runner
.
from chalk import online
@online
def get_is_runner(
name: User.hobbies
) -> User.is_runner:
return "runner" in hobbies
A question that often crops up at this point is, “I have defined a resolver, but how do I make it run?”
Just like features, the idea behind resolvers is that they are declarative. Resolvers don’t run when you define them and they don’t run exhaustively over your data. They run only in response to a request for the features that they are responsible for computing. In practice, this means that they run in response to “queries”.
Resolvers can also be run on a schedule or triggered. Either approach causes them to 1). execute, in batch-job fashion, across all data they find in your upstream data sources, and 2). write computed features to your offline store.
A query is just a request for data. Chalk takes a query and determines which resolvers it needs to run, optimizes the execution order, and provides you with a response. For now, we’ll hold off on discussing queries further, but feel free to skip ahead and get a sense of what Chalk queries are and how they work before diving deeper into resolvers.