Reverse ETL is the process of moving data from a data warehouse into operational systems. In the context of Chalk’s architecture, our data warehouse is the offline data store (Timescale or BigQuery) and our operational system is the online data store (Redis, Cloud Memorystore, or DynamoDB). The online data store is queried by one our our API clients, while the offline data store queried by our bulk API.
Data from online resolvers is always loaded into the offline store
and made available for training.
data from offline resolvers is not loaded into online stores by default.
To enable offline data to reach the online environment,
use the keyword argument
etl_offline_to_online on the feature you wish to ETL.
favorite_color: str = feature(etl_offline_to_online=True)
def fn(...) -> DataFrame[User.favorite_color, ...]:
When this argument is present in the feature declaration, Chalk copies this feature into the online environment.
Reverse ETL can also be assigned to all features in a namespace:
email: str = feature(etl_offline_to_online=False)
will be reverse ETL’d into the online environment.
User.email, which specifies the ETL parameter at the feature level,
will not be reverse ETL’d.
When data from an offline store reaches an online store, it is necessarily somewhat stale. The data may have come from an events table, where it could be arbitrarily old, or it could be a snapshot that was live when the snapshot was taken, but takes non-zero time to migrate online. Therefore, you will only receive offline data from queries in the online environment when you your queries tolerate maximum staleness via features or maximum staleness via queries.