Offline Queries
Fetch offline features values.
Chalk supports a Python client for sampling offline data for use in training or feature development. This client can be used directly in a Juypter notebook:
The client for querying offline data largely mirrors the contract for querying data online. Here, however, we return many rows of data instead of data for a single example.
Offline data is accessed via the class
chalk.ChalkClient
.
Authentication is handled by the Chalk CLI tool.
So long as the machine you’re using is authenticated
to Chalk, no API tokens or client secrets are needed
for use in a notebook.
The ChalkClient
exposes a method
offline_query
,
which takes in
input features (input
),
desired features to compute (output
), and
information about the environment (environment
),
and returns a Dataset
which includes the requested features.
As input, offline_query
takes a
chalk.DataFrame
or
pandas.DataFrame
with one column for each known feature in the input, and one column
with the heading of chalk.features.timestamp
.
The values of the chalk.features.timestamp
field should be
datetime.datetime
. If the timestamp column is omitted, it is defaulted to datetime.now()
.
Instead of a DataFrame
, users can pass a mapping from features to a list of values for each feature.
input={
User.id: ['id1', 'id2'],
User.age: [23, 40]
}
Alternatively, timestamps can be passed in a separate argument input_times.
Output is a list of features that you’d like to sample. For example:
output=[
User.returned_transactions_last_60,
User.user_account_name_match_score,
User.socure_score,
User.identity.has_verified_phone,
User.identity.is_voip_phone,
User.identity.account_age_days,
User.identity.email_age,
]
Users can request that certain features be recomputed at query time instead of sampled from the offline store. The recompute_features
argument controls this behavior, listing the features which should not be sampled. A value of True
will cause all features to be recomputed and nothing will be sampled.
In the environment, you can can control the tags and environment parameters of the query. These arguments function in the same fashion as in the online environment.
The return value is given as a chalk.DataFrame, with the columns in the order of the requested output.