Reverse ETL - Chalk

Reverse ETL is the process of moving data from a data warehouse into operational systems. In the context of Chalk’s architecture, our data warehouse is the offline data store (Timescale or BigQuery) and our operational system is the online data store (Redis, Cloud Memorystore, or DynamoDB). Chalk’s API client can be used to query the online data store and the offline data store. one our API clients, while the offline data store can be queried by our bulk API.

Data from online resolvers is always loaded into the offline store and made available for training. In contrast, data from offline resolvers is not loaded into online stores by default. To enable offline data to reach the online environment, use the keyword argument etl_offline_to_online on the feature you wish to ETL.

from chalk.features import features, feature, offline
from chalk import DataFrame

@features
class User:
    ...
    favorite_color: str = feature(etl_offline_to_online=True)

@offline
def fn(...) -> DataFrame[User.favorite_color, ...]:
    ...

When this argument is present in the feature declaration, Chalk copies this feature into the online environment.

Reverse ETL can also be assigned to all features in a namespace:

from chalk.features import features, feature

@features(etl_offline_to_online=True)
class User:
    fraud_score: float
    full_name: str
    email: str = feature(etl_offline_to_online=False)
    ...

Here, User.fraud_score and User.full_name will be reverse ETL’d into the online environment. However, User.email, which specifies the ETL parameter at the feature level, will not be reverse ETL’d.

Interplay with max staleness

When data from an offline store reaches an online store, it is necessarily somewhat stale. The data may have come from an events table, where it could be arbitrarily old, or it could be a snapshot that was live when the snapshot was taken, but takes non-zero time to migrate online. Therefore, you will only receive offline data from queries in the online environment when you your queries tolerate maximum staleness via features or maximum staleness via queries.

​Interplay with max staleness

On this page

Interplay with max staleness