Resolvers
Specify dependencies of feature resolvers.
Resolvers can depend on other features to compute their features. These dependencies are declared through the type signature of the arguments to the resolver function.
To depend on features from a feature class, label your resolver arguments with your features as the types. You can then use those arguments in the body of your resolver to compute your output features. If you’re running our editor plugin, your editor will see the type of each variable as the type of the underlying scalar.
from chalk.features import features, online
@features
class User:
id: str
email: str
email_domain: str
@online
def get_domain(email: User.email) -> User.email_domain:
# type(email) == str
return email.split('@')[1].lower()
You can require multiple features in a resolver. However, all feature dependencies in a single resolver need to originate in the same root namespace:
Requiring features from the same root namespace
@online
def fn(a: User.email, b: User.name) -> User.email_name_match:
return ...
Here, we incorrectly request features from the root namespaces
of Transfer
and User
:
Requiring features from different root namespaces
@online
def fn(a: User.email, b: Transfer.memo) -> Transfer.email_in_memo:
return email in memo
If you want to require features from namespaces, you can use has-one or has-many relationships.
Requiring features from different root namespaces using a relationship
@online
def fn(email: Transfer.user.email, memo: Transfer.memo) -> Transfer.email_in_memo:
return email in memo
You can also require features joined to a feature class through has-one relationships. For example, if users in your system have bank accounts, and you wanted to compare the name on the user’s bank account to the user’s name, you could require the user’s name and the account’s title through the user:
@online
def name_sim(title: User.account.title, name: User.full_name) -> ...
You can also require all scalars on the user’s profile:
@online
def fn(profile: User.profile) -> ...:
profile.signup_date
Chalk will materialize all scalar features on the profile before calling this function. If you want to pull only a few features from the profile, require each directly:
@online
def fn(signup_date: User.profile.signup_date, age: User.profile.age) -> ...
Has-one relationships can also be declared as optional. You may also require feature through optional relationships, but the types for all of those optional features will become optional. Consider the below example:
@features
class Account:
id: int
user_id: int
balance: float # Non-optional balance
@features
class User:
id: int
# Optional relationship
account: Account | None = has_one(lambda: Account.user_id == User.id)
has_high_balance: bool
@online
def has_high_bal(balance: User.account.balance) -> User.has_high_balance:
# Balance will be "float | None"
if balance is None:
return False
return balance > 1000
The resolver in this example receives an optional float
,
even though balance
is not an optional field on Account
.
The optional is added because the user may not have an account,
in which case the resolver will receive None
for the balance.
You can also traverse nested has-one relationships in the same manner as requiring a single has-one.
Consider a schema where users have a feature class of profile information, and the user’s profile has an identity feature class, which in turn has the age of the user’s email. You can require the email age feature as below:
@online
def fn(email_age: User.profile.identity.email_age) -> ...
However, you cannot access nested relationships without explicit asking for them.
Accessing a transitive relationship from a dependency.
@online
def fn(acct: User.account) -> ...:
acct.balance # Ok
acct.institution.name # Error!
Instead, you can require the nested relationship directly and access any of its scalar features.
Directly requiring the transitive relationship.
@online
def fn(ins: User.account.institution, acct: User.account) -> ...:
acct.balance # Ok
ins.name # Ok
The semantics of optional has-one dependencies carry over to nested has-one dependencies. If you traverse an optional relationship, then all downstream attributes will become optional.
You can also require has-many relationships as inputs to your resolver:
@online
def fn(transfers: User.transfers) -> ...:
You receive a Chalk DataFrame, which supports projections, filtering, and aggregations, among other operations.
By default, Chalk will materialize all scalar features
on the Transfer
feature class before calling your resolver.
As an optimization hint, you can specify which features from
the transfers that you’d like Chalk to materialize before calling
the function. For example, if there were expensive features
to compute on the transfer, you could scope the features
to only the set you need:
@online
def fn(transfers: User.transfers[Transfer.amount, Transfer.memo]) -> ...:
transfers[Transfer.amount].sum() # Ok
transfers[Transfer.from_institution] # Error: filtered out above
The error above is surfaced statically by our editor plugin.
You can apply filters to the has-many inputs of resolvers:
@online
def fn(transfers: User.transfers[Transfer.amount > 100]) -> ...:
Filters can be composed with
projections following the semantics
of the Chalk DataFrame
.
@online
def fn(transfers: User.transfers[Transfer.amount > 100, Transfer.memo]) -> ...:
Filters can also be used on features through has-one relationships.
@online
def fn(transfers: User.transfers[Transfer.amount, Transfer.bank.location == "USA"]) -> ...:
More on projection and filtering here
Has-many relationships can be required through has-one relationships:
@online
def fn(transfers: User.account.transfers) -> ...:
As with scalar has-many dependencies, you can scope down the scalar features on the transfer to only those required:
@online
def fn(transfers: User.account.transfers[Transfer.amount]) -> ...:
transfers[Transfer.amount].sum() # Ok
transfers[Transfer.from_institution] # Error: filtered out above
If the has-one relationship that you’re traversing is
optional,
then the transfers
argument in the example above will
either be None
or a Chalk DataFrame.
You can also select columns through nested has-one relationships that would not normally materialize.
@online
def fn(transfers: User.transfers[Transfer.amount, Transfer.bank.name]) -> ...:
In the above example, if there was a has-one relationship between Transfer
and Bank
,
we can fetch any scalar feature from Bank
as well for the DataFrame
.
Note that the Bank.name
column would not materialize with the simple transfers: User.transfers
typing,
since this typing only materializes scalar features in the root namespace.
Has-many relationships can be required through other has-many relationships.
For example, consider the following feature definitions for User
, Account
, and Transaction
,
where a user can have many accounts, each with many transactions.
from chalk.features import features, has_many, DataFrame
@features
class Transaction:
id: str
account_id: str
amount: float
@features
class Account:
id: str
user_id: str
transactions: DataFrame[Transaction] = has_many(
lambda: Transaction.account_id == Account.id
)
@features
class User:
id: str
total_spent: float
accounts: DataFrame[Account] = has_many(lambda: Account.user_id == User.id)
We can resolve the total_spent
feature on User
by computing the sum of transaction
amounts across all of a user’s accounts, as shown below.
@online
def get_total_spent(
txns: User.accounts.transactions[Transaction.amount]
) -> User.total_spent:
return txns.sum()