Resolvers
Create resolvers to compute feature values.
While feature classes define what your data looks like (e.g. the types of your features, their relationships, and their properties), resolvers define how your feature values are computed.
If you haven't read the section on feature classes, you are strongly encouraged to do so. Understanding feature classes will help you understand resolvers.
Resolvers have four main components: inputs, outputs, code, and run conditions.
The first three of these are rather straightforward. The inputs to a resolver are the features it uses to compute its output. The outputs of a resolver are the features that it computes. The code of the resolver is the logic that transforms your input features into your output features.
The last component, run conditions, is a little more nuanced. At a high level, Chalk is optimized for two different access patterns: online and offline. In online access, information about a single entity is being requested and features must be computed quickly and cached. In offline access, a large chunk of historical feature data is being requested (you can think of this as creating a dataset for model training or running an analytic query).
Resolvers are always specified as either online or offline. We go into more detail about what this means in the next section where we talk about run conditions, but for now it is important to note that: 1). every resolver is either online or offline, and 2). different resolvers run depending on how you are requesting data from Chalk.
Chalk allows you to write resolvers in three different ways:
Below is an example of how to write the three kinds of resolvers in Chalk
for a User
feature class. In this example, we assume that a Postgres data source
called pg_users
has been configured and connected to Chalk.
First, let’s define our User
feature class:
from chalk.features import features
@features
class User:
id: str
name: str
age: int
hobbies: list[str]
is_runner: bool # computed feature
athleticism_score: float # computed feature
Now we define our resolvers:
We resolve the id
, name
, age
, and hobbies
features for our users from the
users
table of our pg_users
data source. When reading data from your data
sources, you should use a SQL resolver:
-- The features given to us by the user.
-- resolves: user
-- type: online
-- source: pg_users
select id, name, age, hobbies
from users;
To compute the is_runner
feature for users, we simply check whether running
is in
the list of hobbies, which we can do with an expression. Chalk expressions utilize low-latency
C++ for efficient execution. Chalk expressions can use any of the chalk.functions
to express
a variety of computations.
import chalk.functions as F
from chalk.features import features, _
@features
class User:
id: str
name: str
age: int
hobbies: list[str]
is_runner: bool = F.contains(_.hobbies, "running")
athleticism_score: float # computed feature
We now use a python resolver to compute the athleticism_socre
feature for our users.
Python resolvers are useful for integrating existing libraries, and executing more complex
logic, such as API calls.
from chalk import online
from src.user import User
from utils.athleticism import is_athletic
@online
def get_is_runner(
hobbies: User.hobbies
) -> User.athleticism_score:
num_athletic_hobbies = sum([is_athletic(hobby) for hobby in hobbies])
return num_athletic_hobbies / len(hobbies)
A question that often crops up at this point is, “I have defined a resolver, but how do I make it run?”
Just like features, the idea behind resolvers is that they are declarative. Resolvers don’t run when you define them, and they don’t run exhaustively over your data. They run only in response to a request for the features that they are responsible for computing. In practice, this means that they run in response to “queries”.
Resolvers can also be run on a schedule or triggered. Either approach causes them to 1). execute, in batch-job fashion, across all data they find in your upstream data sources, and 2). write computed features to your offline store.
A query is just a request for data. Chalk takes a query and determines which resolvers it needs to run, optimizes the execution order, and provides you with a response. For now, we’ll hold off on discussing queries further, but feel free to skip ahead and get a sense of what Chalk queries are and how they work before diving deeper into resolvers.