Chalk home page
Docs
API
CLI
  1. Offline Queries
  2. Overview

Chalk supports a Python client for sampling offline data for use in training or feature development. This client can be used directly in a Juypter notebook:

localhost:3000
Chalk AI - Documentation Reference
Jupyter Notebook
Chalk AI - Alerts

The client for querying offline data largely mirrors the contract for querying data online. Here, however, we return many rows of data instead of data for a single example.

API

Offline data is accessed via the class chalk.ChalkClient. Authentication is handled by the Chalk CLI tool. So long as the machine you’re using is authenticated to Chalk, no API tokens or client secrets are needed for use in a notebook.

The ChalkClient exposes a method offline_query, which takes in input features (input), desired features to compute (output), and information about the environment (environment), and returns a Dataset which includes the requested features.

Input

As input, offline_query takes a chalk.DataFrame or pandas.DataFrame with one column for each known feature in the input, and one column with the heading of chalk.features.timestamp. The values of the chalk.features.timestamp field should be datetime.datetime. If the timestamp column is omitted, it is defaulted to datetime.now(). Instead of a DataFrame, users can pass a mapping from features to a list of values for each feature.

input={
    User.id: ['id1', 'id2'],
    User.age: [23, 40]
}

Input Times

Alternatively, timestamps can be passed in a separate argument input_times.

Output

Output is a list of features that you’d like to sample. For example:

output=[
    User.returned_transactions_last_60,
    User.user_account_name_match_score,
    User.socure_score,
    User.identity.has_verified_phone,
    User.identity.is_voip_phone,
    User.identity.account_age_days,
    User.identity.email_age,
]

Recompute Features

Users can request that certain features be recomputed at query time instead of sampled from the offline store. The recompute_features argument controls this behavior, listing the features which should not be sampled. A value of True will cause all features to be recomputed and nothing will be sampled.

Environment

In the environment, you can can control the tags and environment parameters of the query. These arguments function in the same fashion as in the online environment.

Return Value

The return value is given as a chalk.DataFrame, with the columns in the order of the requested output.