Chalk home page
Docs
API
CLI
  1. Features
  2. Overview

Chalk lets you spell out your features directly in Python. Features are namespaced to a FeatureSet. To create a new FeatureSet, apply the @features decorator to a Python class with typed attributes. A FeatureSet is constructed and functions much like Python’s own dataclass.

Example

from datetime import datetime
from typing import Optional
from chalk.features import features

@features
class UserFeatures:
    id: int
    full_name: str
    nickname: Optional[str]
    email: Optional[str]
    birthday: datetime
    fraud_score: float

Namespacing

Features are namespaced by their containing FeatureSet, and then by the name of the variable.

In the above example, our features, when rendered as strings, are:

Feature NameType
user.id Integer
user.full_name String
user.nickname String | None
user.email String | None
user.birthday Datetime
user.fraud_score Decimal

(FeatureSet names are stripped of the suffix “Features”, if it exists).

Overrides

Feature names and feature classes can be overridden by supplying the name keyword argument to the feature function or the @features decorator. This practice allows us to evolve our variable names without losing the past history of this feature.

@features
class Prince:
@features(name="prince")
class TheArtistFormerlyKnownAsPrince:
   birthday: datetime
   date_of_birth: datetime = feature(name="birthday")

Primary keys

Feature sets must all have a primary key. This primary key is used to associate features you later resolve with this namespace. Your primary key can have any type, given by the type annotation on the field.

By default, if you have a feature with the name id, that feature will be the primary key. However, you can override this behavior:

from chalk.features import features, Primary

@features
class User:
    user_id: Primary[str]
    ...

If you mark an explicit primary key, it will override the default behavior:

@features
class User:
    user_id: Primary[str]
    # Not really the primary key!
    id: str

Alternatively, you can use the feature(...) function to set a feature to primary:

from chalk.features import features, feature

@features
class User:
    user_id: str = feature(primary=True)

Versions

Chalk versions all of your features with every deployment. However, you can also choose explicit versions for your features.

@features
class User:
    ...
    email_domain: str = feature(version=2)

Feature time

By default, Chalk marks the time a feature was created as the time that its resolver was run. However, you may want to provide a custom value for this time for data sources like events tables.

You can inspect the time a feature was created and set the time for when a feature was created by creating a feature assigned to the feature_time() function.

from chalk.features import feature_time

@features
class UserFeatures:
   ts: datetime = feature_time()
   ...

To set the time a feature was created, assign the feature when you resolve it:

@offline
def fn(uid: User.uuid) -> Features[User.name, User.ts]:
    return User(
        name="Anousheh Ansari",
        ts=datetime(month=9, day=12, year=1966)
    )

Then, when you sample offline data, the name feature will be treated as having been created at the provided date.

Constructing feature classes

To construct a UserFeatures instance, supply the feature values to the __init__() method

UserFeatures(full_name="Grace Hopper", nickname="Amazing Grace")
UserFeatures(email="grace.hopper@yale.edu")

The @features decorator adds a custom __init__():

def __init__(
    self,
    uid: int | MISSING = MISSING,
    full_name: str | MISSING = MISSING,
    email: Optional[str] | MISSING = MISSING,
    ...
):
    self.uid = uid
    self.full_name = full_name
    self.email = email
    ...

Note that all fields have a default MISSING value. Therefore, you can construct feature classes with any subset of the fields you would like to use.

Chalk ships a Mypy Plugin that helps with many of the types in the Chalk package, including to check that FeatureSets are constructed only with features available on the class.

Refactoring

After going to production, you may find that you want to change the name of a property on the feature class. You can change the name of a feature property without changing the underlying data using the name override. From the example in the namespacing section, if you initially called a feature birthday, and decided to rename it date_of_birth, you can keep the underlying data the same and rename the property on the class as follows:

@features
class Prince:
@features(name="prince")
class TheArtistFormerlyKnownAsPrince:
   birthday: datetime
   date_of_birth: datetime = feature(name="birthday")

Here, we also rename the feature class originally named Prince to TheArtistFormerlyKnownAsPrince.

Interplay with auto-id

Where the name of the Python property and the name provided to feature(name=...) differ, IDs are auto-assigned based on the name provided to feature(name=...).