Chalk home page
Docs
API
CLI
  1. Resolvers
  2. Python Resolvers

Python resolvers are Python functions marked with either an @online or @offline decorator, with Python type annotations declaring the dependencies on other features, as well as the output feature(s).

Inputs

Scalar dependencies

To depend on features from a feature class, label your resolver arguments with your features as the types. You can then use those arguments in the body of your resolver to compute your output features. If you’re running our editor plugin, your editor will see the type of each variable as the type of the underlying scalar.

from chalk.features import features, online

@features
class User:
    id: str
    email: str
    email_domain: str

@online
def get_domain(email: User.email) -> User.email_domain:
    # type(email) == str
    return email.split('@')[1].lower()

You can require multiple features in a resolver. However, all feature dependencies in a single resolver need to originate in the same root namespace:

Requiring features from the same root namespace

@online
def fn(a: User.email, b: User.name) -> User.email_name_match:
    return ...

Here, we incorrectly request features from the root namespaces of Transfer and User:

Requiring features from different root namespaces

@online
def fn(a: User.email, b: Transfer.memo) -> Transfer.email_in_memo:
    return email in memo

If you want to require features from namespaces, you can use has-one or has-many relationships.

Requiring features from different root namespaces using a relationship

@online
def fn(email: Transfer.user.email, memo: Transfer.memo) -> Transfer.email_in_memo:
    return email in memo

Has-one dependencies

Scalar has-one

You can also require features joined to a feature class through has-one relationships. For example, if users in your system have bank accounts, and you wanted to compare the name on the user’s bank account to the user’s name, you could require the user’s name and the account’s title through the user:

@online
def name_sim(title: User.account.title, name: User.full_name) -> ...

You can also require all scalars on the user’s profile:

@online
def fn(profile: User.profile) -> ...:
    profile.signup_date

Chalk will materialize all scalar features on the profile before calling this function. If you want to pull only a few features from the profile, require each directly:

@online
def fn(signup_date: User.profile.signup_date, age: User.profile.age) -> ...

Optional has-one

Has-one relationships can also be declared as optional. You may also require feature through optional relationships, but the types for all of those optional features will become optional. Consider the below example:

@features
class Account:
    id: int
    user_id: int
    balance: float  # Non-optional balance

@features
class User:
    id: int
    # Optional relationship
    account: Account | None = has_one(lambda: Account.user_id == User.id)
    has_high_balance: bool

@online
def has_high_bal(balance: User.account.balance) -> User.has_high_balance:
    # Balance will be "float | None"
    if balance is None:
        return False
    return balance > 1000

The resolver in this example receives an optional float, even though balance is not an optional field on Account. The optional is added because the user may not have an account, in which case the resolver will receive None for the balance.

Nested has-one

You can also traverse nested has-one relationships in the same manner as requiring a single has-one.

Consider a schema where users have a feature class of profile information, and the user’s profile has an identity feature class, which in turn has the age of the user’s email. You can require the email age feature as below:

@online
def fn(email_age: User.profile.identity.email_age) -> ...

However, you cannot access nested relationships without explicit asking for them.

Accessing a transitive relationship from a dependency.

@online
def fn(acct: User.account) -> ...:
    acct.balance           # Ok
    acct.institution.name  # Error!

Instead, you can require the nested relationship directly and access any of its scalar features.

Directly requiring the transitive relationship.

@online
def fn(ins: User.account.institution, acct: User.account) -> ...:
    acct.balance  # Ok
    ins.name      # Ok

The semantics of optional has-one dependencies carry over to nested has-one dependencies. If you traverse an optional relationship, then all downstream attributes will become optional.


Has-many dependencies

Scalar has-many

You can also require has-many relationships as inputs to your resolver:

@online
def fn(transfers: User.transfers) -> ...:

You receive a Chalk DataFrame, which supports projections, filtering, and aggregations, among other operations.

Projections

By default, Chalk will materialize all scalar features on the Transfer feature class before calling your resolver. As an optimization hint, you can specify which features from the transfers that you’d like Chalk to materialize before calling the function. For example, if there were expensive features to compute on the transfer, you could scope the features to only the set you need:

@online
def fn(transfers: User.transfers[Transfer.amount, Transfer.memo]) -> ...:
    transfers[Transfer.amount].sum()      # Ok
    transfers[Transfer.from_institution]  # Error: filtered out above

The error above is surfaced statically by our editor plugin.

Filtering

You can apply filters to the has-many inputs of resolvers:

@online
def fn(transfers: User.transfers[Transfer.amount > 100]) -> ...:

Filters can be composed with projections following the semantics of the Chalk DataFrame.

@online
def fn(transfers: User.transfers[Transfer.amount > 100, Transfer.memo]) -> ...:

Filters can also be used on features through has-one relationships.

@online
def fn(transfers: User.transfers[Transfer.amount, Transfer.bank.location == "USA"]) -> ...:

More on projection and filtering here

Has-many through has-one

Has-many relationships can be required through has-one relationships:

@online
def fn(transfers: User.account.transfers) -> ...:

As with scalar has-many dependencies, you can scope down the scalar features on the transfer to only those required:

@online
def fn(transfers: User.account.transfers[Transfer.amount]) -> ...:
    transfers[Transfer.amount].sum()      # Ok
    transfers[Transfer.from_institution]  # Error: filtered out above

Has-many through optional-has-one

If the has-one relationship that you’re traversing is optional, then the transfers argument in the example above will either be None or a Chalk DataFrame.

Has-one through has-many

You can also select columns through nested has-one relationships that would not normally materialize.

@online
def fn(transfers: User.transfers[Transfer.amount, Transfer.bank.name]) -> ...:

In the above example, if there was a has-one relationship between Transfer and Bank, we can fetch any scalar feature from Bank as well for the DataFrame. Note that the Bank.name column would not materialize with the simple transfers: User.transfers typing, since this typing only materializes scalar features in the root namespace.

Has-many through has-many

Has-many relationships can be required through other has-many relationships.

For example, consider the following feature definitions for User, Account, and Transaction, where a user can have many accounts, each with many transactions.

from chalk.features import features, has_many, DataFrame


@features
class Transaction:
    id: str
    account_id: str
    amount: float


@features
class Account:
    id: str
    user_id: str
    transactions: DataFrame[Transaction] = has_many(
        lambda: Transaction.account_id == Account.id
    )


@features
class User:
    id: str
    total_spent: float
    accounts: DataFrame[Account] = has_many(lambda: Account.user_id == User.id)

We can resolve the total_spent feature on User by computing the sum of transaction amounts across all of a user’s accounts, as shown below.

@online
def get_total_spent(
    txns: User.accounts.transactions[Transaction.amount]
) -> User.total_spent:
    return txns.sum()

Outputs

Python resolvers can output either scalar features or a DataFrame of features.

Scalar output

Single output

To return a single feature from a resolver, set the return type annotation to the feature you want to resolve:

from chalk.features import features

@features
class User:
    id: int
    name: str
    employer: str

@online
def resolve(u: User.id) -> User.name:
    return "Jennifer Doudna"

Equivalently, you can wrap the return value in the User class:

@online
def resolve(u: User.id) -> User.name:
  return "Jennifer Doudna"
  return User(name="Jennifer Doudna")

Multiple outputs

To return multiple features, return an instance of the feature class. In the type signature, specify the Features[...] class, parameterized by the features that you pass to the feature class.

@online
def resolve(u: User.id) -> Features[User.name, User.employer]:
  return User(
      name="Jennifer Doudna",
      employer="University of California, Berkeley"
  )

You only need to pass a subset of the features to the constructor for the feature class.

The editor plugin will check that the type annotation you assign to the resolver matches subset of features passed to the constructor of the feature class.

All features

To return all features of a class, use Features[...] around the feature class.

@online
def get_user(u: User.id) -> Features[User]:
    return User(
        name="Jennifer Doudna",
        employer="University of California, Berkeley"
    )

If your resolver takes input features, those features are not considered as part of the output features.

Note that the id feature is not returned from the function.

This definition is equivalent to:

@online
def get_user(u: User.id) -> Features[User]:
def get_user(u: User.id) -> Features[User.name, User.employer]:
  return User(
      name="Jennifer Doudna",
      employer="University of California, Berkeley"
  )

However, you may want to return almost all features of a class. Writing out all the features can be tedious and error-prone. You can subtract features from a feature class using the - operator:

from chalk.features import Features, ...

@online
def get_all_users(id: User.id) -> Features[User] - User.name:
    return User(employer="University of California, Berkeley")

Here, both the id feature and the name feature are not returned, which leaves only the employer feature.


DataFrame output

You can also output many instances of a feature class from a resolver by specifying a DataFrame as the return type of the function:

@offline
def get_events() -> DataFrame[Transfer.uuid, Transfer.amount, Transfer.ts]:
    return DataFrame.read_csv(...)

Say you wanted to return many instances of a feature class, including nested features, from a resolver, then you can the DataFrame class for your return type and in your resolver definition.

@online
def get_user_employer_information(id: User.id) -> Dataframe[User.id, User.name, User.employer.name, User.employer.category]:
    return DataFrame([
        User(
            id="1",
            name="Jennifer Doudna",
            employer=Employer(
                name="University of California, Berkeley",
                category="Education"
            )
        )
    ])

For more info on how to load batch data, see the Data Sources sections. DataFrame-returning resolvers don’t need inputs.

All features

To return all features of a class in a DataFrame, use DataFrame[...] class around the feature class:

@online
def get_all_users() -> DataFrame[User]:
    return DataFrame([
        User(
            name="Jennifer Doudna",
            employer="University of California, Berkeley"
        )
    ])

Other DataFrame-returning resolvers

Imagine a scalar feature you’d like to backfill over many thousands of primary keys and historical times. DataFrame-returning resolvers can dramatically reduce the computation time due to its vectorized handling.

@offline
def get_new_feature_as_dataframe(
    df: DataFrame[Transaction.id, ...]
) -> DataFrame[Transaction.id, Transaction.new_feature]:

The above resolver runs faster on a thousand rows than the equivalent scalar resolver ran a thousand times.

Chalk also supports relationship-returning resolvers that enable users to return a DataFrame belonging to a has-many relationship.

@offline
def relationship_returning_resolver(
    df: User.transactions[Transaction.id, Transaction.amount, Transaction.description],
    user_type: User.type
) -> User.transactions[Transaction.id, Transaction.transaction_type]:

Just make sure that the return DataFrames do not have duplicate rows. That means no two rows should have the same primary key, or primary key & timestamp combinations if the feature time is also returned.