Chalk home page
Docs
SDK
CLI
  1. Resolvers
  2. Expressions

Overview

Chalk Expressions let you define features declaratively, using symbolic computation over your data. While you write expressions in idiomatic Python, they are compiled and executed as vectorized C++, enabling low-latency computation at serve time and high-throughput processing at train time.

Expressions support a wide range of operations, including arithmetic, filtering, aggregations, and built-in functions like .

For example, in a Transaction feature class, we can compute the subtotal of a transaction as the difference between its total and sales tax:

from chalk.features import features, Primary
from chalk import _

@features
class Transaction:
    id: int
    total: float
    sales_tax: float
    subtotal: float = _.total - _.sales_tax

The _ symbol refers to the current scope (here, the feature class Transaction) and is used to reference other fields on the same instance. Expressions like _.total - _.sales_tax are compiled into native execution plans that run efficiently in production.

In addition to referencing fields on the same object, you can traverse relationships. If each Transaction is associated with a User, for example, you can compute a string similarity between the user’s name and the transaction memo:

import chalk.functions as F

@features
class Transaction:
    ...
    memo: str
    user: "User"
    name_match_score: float = F.jaccard_similarity(
      _.user.name, _.memo
    )

Here, _.user.name follows the foreign key relationship from Transaction to User. The function F.jaccard_similarity is one of many built-in Chalk functions that operate on symbolic expressions.

Expressions can also perform aggregations over related records. In a User feature class, we can compute aggregates like the number of large transactions or the total amount spent:

from chalk import _
from chalk.features import DataFrame, features

@features
class User:
    id: int
    name: str
    transactions: DataFrame[Transaction]

    num_large_txns: int = _.transactions[_.total > 1000].count()
    total_spend: float = _.transactions[_.total].sum()

In this context, _ refers to the User instance when referring to _.transactions. But when you apply a filter, like _.transactions[_.total > 1000], the expression inside the brackets is evaluated in the context of each individual Transaction. That means _.total refers to the total field on each Transaction, not on the User. This scoped evaluation makes it easy to filter, project, and aggregate over related data.

All expressions are statically analyzed, optimized to eliminate redundant computation, and executed as high-performance C++ at runtime.


Scalar Functions

Chalk expressions support a wide range of built-in functions for manipulating data, performing calculations, and transforming features. These functions can be used in expressions to operate on feature values, DataFrames, and other data types.

from chalk import features, online
from chalk.features import _, features

@features
class Transaction:
  id: int
  total: float
  sales_tax: float
  subtotal: float = _.total - _.sales_tax

@online
def get_subtotal(total: Transaction.total, sales_tax: Transaction.sales_tax) -> Transaction.subtotal:
    return total - sales_tax

Infix Operators

Chalk expressions support a variety of infix operators for arithmetic, conditions, and boolean logic.

OperatorDescriptionExample
+Addition_.total + _.sales_tax
-Subtraction_.total - _.sales_tax
*Multiplication_.quantity * _.price
/Division_.total / _.quantity
>Greater than_.total > 1000
>=Greater than or equal_.total >= 1000
<Less than_.total < 1000
<=Less than or equal_.total <= 1000
==Equal_.status == "completed"
!=Not equal_.status != "completed"
&Boolean and_.is_active & _.is_verified
|Boolean or_.is_active | _.is_verified
~Boolean not~_.is_active

Don't use `or`, `and`, `not`, and `is` in expressions

Do not use Python's `and`, `or`, `not`, or `is` operators in expressions. Python does not allow these operators to be overridden, so they will not work with Chalk's expressions.

Builtin Functions

The chalk.functions module exposes several helpful functions that can be used in combination with expression references to transform features. These functions are meant to be used in expressions and are not available as standalone functions. To view all available functions, see our SDK docs

Structs

Expressions can be used to access nested attributes from other features in a feature class, whether these other features are struct dataclasses, features, or DataFrames.

import chalk.functions as F
from chalk.features import features
from dataclasses import dataclass

@dataclass
class LatLon:
    lat: float | None
    lon: int | None

@features
class User:
    id: int
    home: LatLon
    work: LatLon
    commute_distance: float = F.haversine(
        lat1=_.home.lat,
        lon1=_.home.lon,
        lat2=_.work.lat,
        lon2=_.work.lon,
    )

Custom Functions

You can create custom functions to encapsulate complex logic or reusable computations in your expressions. For example, if you wanted to apply consistent windows across many features, you could define a custom function like this:

Use helper functions to create expressions

from chalk import _
from chalk.features import features, DataFrame

def count_where(*filters):
    return _.transactions[_.created_at > _.chalk_window, *filters].count()

@features
class User:
    id: int
    transactions: DataFrame[Transaction]
    num_large_transactions: int = count_where(_.total > 1000)
    num_small_transactions: int = count_where(_.total < 100)

Don't use expressions in Python resolvers

@features
class User:
    id: int
    name: str
    email: str
    name_email_match_score: float

@online
def get_score(
    name: User.name,
    email: User.email,
) -> User.name_email_match_score:
    # DONT DO THIS!!
    return F.jaccard_similarity(name, email)

Expressions are not supported in Python resolvers, so you cannot use Chalk functions like F.jaccard_similarity in a Python resolver. Instead, use expressions to define the feature directly in the feature class.

@features
class User:
    id: int
    name: str
    email: str
    name_email_match_score: float = F.jaccard_similarity(
        _.name, _.email
    )

DataFrame Functions

Conditions and filters

DataFrame features can be filtered with expressions.

Extending our Transaction example, we can create a User feature class with a has-many relationship to Transaction. Then, we can define a feature representing the number of large purchases by referencing the existing User.transactions feature:

from chalk.features import _, features, DataFrame

@features
class Transaction:
    id: int
    user_id: "User.id"
    total: float
    sales_tax: float
    subtotal: float = _.total - _.sales_tax

@features
class User:
   id: int
   # implicit has-many relationship with Transaction due to `user_id` above
   transactions: DataFrame[Transaction]
   num_large_transactions: int = _.transactions[_.total > 1000].count()

The object referenced by _ changes depending on its current scope. In this code, the _ in _.transactions references the User object. Within the DataFrame filter, the _ in _.total references each Transaction object as each one is evaluated. The count aggregation is covered in the next section.

Projections and aggregations

DataFrame features support projection with expressions, which produce a new DataFrame scoped down to the referenced columns. DataFrames can be aggregated after eligible columns are selected.

With our Transaction example, we already saw a count aggregation for counting the number of large transactions. We can add another aggregation for computing the user’s total spend:

from chalk.features import _, features, DataFrame

@features
class Transaction:
  id: int
  user_id: "User.id"
  sales_tax: float
  subtotal: float
  total: float = _.subtotal + _.sales_tax

@features
class User:
  id: int
  transactions: DataFrame[Transaction]
  num_large_transactions: int = _.transactions[_.total > 1000].count()
  total_spend: float = _.transactions[_.total].sum()

To compute User.total_spend, we needed to create a projection of the User.transactions DataFrame limited to only the Transaction.total column so that the sum aggregation could work. In contrast, no projection was needed for the num_large_transactions aggregation because count works on DataFrames with any number of columns.

Use materialized aggregations

For computing low-latency aggregations over high volumes of data, Chalk also offers materialized windowed aggregations that uses materialization of buckets of data to compute large aggregations efficiently.

Aggregation functions

Aggregation functions have varying behavior when handling None values and empty DataFrames. If an aggregation function says None values are skipped in the table below, it will consider a DataFrame with only None values as empty.

FunctionNone valuesEmpty DataFrameNotes
sumSkippedReturns 0
minSkippedReturns None
maxSkippedReturns None
meanSkippedReturns NoneFeature type must be float or float \| None. None values are skipped, meaning they are not included in the mean calculation.
countIncludedReturns 0
anySkippedReturns False
allSkippedReturns True
stdSkippedSee notesStandard deviation. Requires at least 2 values. For DataFrames with less than 2 values, returns None.
Aliases: stddev, stddev_sample, std_sample.
varSkippedSee notesVariance. Same requirements as std.
Alias: var_sample.
approx_count_distinctSkippedReturns 0
approx_percentileSkippedReturns NoneTakes one argument, quantile, expected to be a float in the range [0, 1].
Example: approx_percentile(0.75) returns a value approximately equal to the 75th percentile of the not-None values in the DataFrame.

For aggregations that can return None, either mark the feature as optional (for example, by setting the feature type to float | None) or use coalesce to fall back to a default value.


Run conditions

To specify run conditions such as environment, tags, and versions for a feature that is resolved through an expression, you can use the feature function and pass in the expression as an argument.

from chalk.features import Primary, features, feature

@features
class User:
    id: int

    purchases: DataFrame[Purchase]
    # Uses a default value of 0 when one cannot be computed.
    num_purchases: int = feature(
        expression=_.purchases.count(),
        default=0,
        environment=["staging", "dev"],
        tags=["fraud", "credit"],
        version=1,
    )

Testing

To test your expressions, we recommend setting up integration tests or iterating on a branch.