Resolvers
Specify results of feature resolvers.
Resolvers declare the features that they resolve through a Python type annotation on the return value of the function.
To return a single feature from a resolver, set the return type annotation to the feature you want to resolve:
from chalk.features import features
@features
class User:
id: int
name: str
employer: str
@online
def resolve(u: User.id) -> User.name:
return "Jennifer Doudna"
Equivalently, you can wrap the return value in the User
class:
@online
def resolve(u: User.id) -> User.name:
return "Jennifer Doudna"
return User(name="Jennifer Doudna")
To return multiple features, return an
instance of the feature class.
In the type signature, specify
the Features[...]
class, parameterized
by the features that you pass to the feature class.
@online
def resolve(u: User.id) -> Features[User.name, User.employer]:
return User(
name="Jennifer Doudna",
employer="University of California, Berkeley"
)
You only need to pass a subset of the features to the constructor for the feature class.
The editor plugin will check that the type annotation you assign to the resolver matches subset of features passed to the constructor of the feature class.
To return all features of a class,
use Features[...]
around the feature class.
@online
def get_user(u: User.id) -> Features[User]:
return User(
name="Jennifer Doudna",
employer="University of California, Berkeley"
)
If your resolver takes input features, those features are not considered as part of the output features.
Note that the id
feature is not returned from the function.
This definition is equivalent to:
@online
def get_user(u: User.id) -> Features[User]:
def get_user(u: User.id) -> Features[User.name, User.employer]:
return User(
name="Jennifer Doudna",
employer="University of California, Berkeley"
)
However, you may want to return almost all features of a class.
Writing out all the features can be tedious and error-prone.
You can subtract features from a feature class
using the -
operator:
from chalk.features import Features, ...
@online
def get_all_users(id: User.id) -> Features[User] - User.name:
return User(employer="University of California, Berkeley")
Here, both the id
feature and the name
feature are not returned,
which leaves only the employer
feature.
You can also output many instances of a feature class from a resolver by specifying a DataFrame as the return type of the function:
@offline
def get_events() -> DataFrame[Transfer.uuid, Transfer.amount, Transfer.ts]:
return DataFrame.read_csv(...)
Say you wanted to return many instances of a feature class, including nested features, from a resolver, then you can the DataFrame class for your return type and in your resolver definition.
@online
def get_user_employer_information(id: User.id) -> Dataframe[User.id, User.name, User.employer.name, User.employer.category]:
return DataFrame([
User(
id="1",
name="Jennifer Doudna",
employer=Employer(
name="University of California, Berkeley",
category="Education"
)
)
])
For more info on how to load batch data,
see the Data Sources sections.
DataFrame
-returning resolvers don’t need inputs.
To return all features of a class in a DataFrame,
use DataFrame[...]
class around the feature class:
@online
def get_all_users() -> DataFrame[User]:
return DataFrame([
User(
name="Jennifer Doudna",
employer="University of California, Berkeley"
)
])
Imagine a scalar feature you’d like to backfill over many thousands of pkeys and historical times.
DataFrame
-returning resolvers can dramatically reduce the computation time due to its vectorized handling.
@offline
def get_new_feature_as_dataframe(
df: DataFrame[Transaction.id, ...]
) -> DataFrame[Transaction.id, Transaction.new_feature]:
The above resolver runs faster on a thousand rows than the equivalent scalar resolver ran a thousand times.
Chalk also supports relationship-returning resolvers that enable users to
return a DataFrame
belonging to a has-many relationship.
@offline
def relationship_returning_resolver(
df: User.transactions[Transaction.id, Transaction.amount, Transaction.description],
user_type: User.type
) -> User.transactions[Transaction.id, Transaction.transaction_type]:
Just make sure that the return DataFrame
s do not have duplicate rows.
That means no two rows should have the same primary key, or primary key & timestamp combinations if the
feature time is also returned.