Queries
Fetch feature values via online queries
Online queries access or compute feature values for a single feature set in real time. The term “real time” is a little vague—what it means is that responses (even though they might need to compute features on new data) should seem practically instantaneous.
However, online queries don’t only perform data retrieval, they also store the results of the features that they compute. This provides visibility and long-term tracking into the features you are generating. In this section, we provide a high level overview of how online queries compute and record their outputs.
Chalk responds to online queries by getting and executing a query plan.
A query plan is a sequence of tasks, some of which can be executed in parallel, that will produce a target output. Although a simplification, your resolvers are a subset of these tasks. Consider the following feature class and resolvers:
from chalk.features import features
@features
class User:
id: int
name: str
is_palindrome: str
is_short: bool
palindrome_and_short: bool
@online
def get_is_palindrome(name: User.name, User.backwards_name) -> User.is_palindrome:
return name == name[::-1]
@online
def get_is_short(name: User.name) -> User.is_short:
return name.len() < 3
@online
def get_is_short_palindrome(is_short: User.is_short, is_palindrome: User.is_palindrome) -> User.is_short_palindrome:
return is_short and is_palindrome
If you were to begin assembling a dependency graph for the features. You would wind up with something like the following:
┌───────────┐
│name │
└┬─────────┬┘
┌▽───────┐┌▽────────────┐
│is_short││is_palindrome│
└┬───────┘└┬────────────┘
┌▽─────────▽────────┐
│is_short_palindrome│
└───────────────────┘
When you run an online query, such as:
chalk query --in user.id=1 --in user.name=bob --out is_short_palindrome
Chalk constructs a plan for how to “solve” this query. This query plan is viable for other queries with the same input and output features.
Running `chalk query` with the `--explain` flag outputs your query plan.
The following query can reuse the plan generated by the one above:
chalk query --in user.id --in user.name=bartholomew --out is_short_palindrome
Even though both the input name and the output of the query are different, the query plan remains valid.
As illustrated above, a query plan is not a linear sequence of tasks that must be executed one after another: a lot of work can often be performed in parallel.
After getting a query plan, Chalk distributes subtasks to workers, applies a number optimizations on your resolvers/datasource connections, and computes the target outputs of your query. These outputs are then returned.
Online queries write computed values to two places: the offline store and the online store. However, computed features are only written to the online store if they have a caching policy. The online store is used to circumvent recomputation of expensive features that are either unlikely to have changed or can tolerate slightly stale values. Properly configuring the caching policies for your features can make your online queries significantly more efficient.
If you haven’t specified a caching policy, Chalk recomputes the values for a feature each time it is requested. We go into more depth on query caching in a later section.