Features
Set max staleness to introduce caching.
When a feature is expensive or slow to compute, you may wish to cache its value. Chalk uses the terminology “maximum staleness” to describe how recently a feature value needs to have been computed to be returned without re-running a resolver.
You can specify the maximum staleness for a feature as follows:
from chalk.features import feature, features
from datetime import timedelta
@features
class User:
# Using text descriptors:
expensive_fraud_score: float = feature(
max_staleness="1m 30s"
)
# Alternatively, using timedelta:
expensive_fraud_score: float = feature(
max_staleness=timedelta(minutes=1, seconds=30)
)
Max staleness durations can be given in natural language, or specified using datetime.timedelta. You can specify a max staleness of “infinity” to indicate that Chalk should cache computed feature values forever. This makes sense for data that never becomes invalid, or for data that you wish to explicitly update using Streaming Updates or Reverse ETL.
Staleness can also be assigned to all features in a namespace:
@features(max_staleness="1d")
class User:
fraud_score: float
full_name: str
email: str = feature(max_staleness="0s")
...
Here, User.fraud_score
and User.full_name
assume the max-staleness of 1d
.
However, User.email
, which specifies max-staleness at the feature level,
assumes the max-staleness of 0s
, forcing it to be recomputed on every request.
By default, features are not cached, and instead are recomputed for every online request.
In effect, you can think of max_staleness
as being 0
except where otherwise specified.
In an offline environment, all feature values are taken from past runs or historical tables,
where max_staleness
does not apply.
The max_staleness
values provided to the feature
function
may be overridden at the time of querying for features.
See Overriding Default Caching for a detailed discussion.