Queries
Querying Chalk with SQL.
Chalk is testing a direct SQL query interface for your feature values and
data sources in your environment. In the dashboard, under Data Explorer
,
you can find several interfaces that you can use to query your data. The
Online Query Explorer and Offline Query Explorer enable you to execute
online queries and offline queries,
respectively, from the dashboard. In addition, you can use the SQL Query
Interface to query against previously computed feature values from your
offline store as well as data sources in your environment.
Additionally, you can use the GRPC client’s run_sql() endpoint.
Chalk’s SQL interface uses a standard SQL dialect, and all queries are processed through a parser that expects familiar SQL syntax.
Currently, Chalk SQL only supports making SELECT
queries, with the following restrictions:
SELECT
list:.
delimiter, you may reference it with
double quotes, e.g. SELECT "column.with.dot" FROM table
.FROM
clause:read_parquet(path: str)
: Reads a Parquet file from the given path and returns a table. Path must be a literal string.query_values_from_operation_ids(operation_id: str, ...)
: Returns a table consisting of the values queried in the
given operation id(s). The operation id(s) provided must correspond to queries that have the same output schema.WHERE
clause:JOIN
s are not possible yet.WHERE
clause, but not all expressions
are guaranteed to be pushed down to the underlying data source.GROUP BY
clause:GROUP BY
clause.LIMIT
clause:ORDER BY
clause:ORDER BY
clause.In the SQL Interface, there exist various schema providers, which all exist under the chalk
catalog:
chalk.datasources
: directly query against the data sources configured in your
environment (e.g. Postgres, Snowflake, BigQuery, etc.). Table names are tables within the data sources themselves.
Schema: Exactly that of the table in the data source.
chalk.historical_values
: query against historically observed feature values with table names of the form
feature_namespace.feature_name
from your offline store, or query for the freshest values for each primary key
for a specific feature namespace with table names feature_namespace
Schema for feature_namespace.feature_name
:
observation_id
, string, non-nullable: The unique identifier for the observation of the feature value.pkey
, (type of feature namespace’s pkey), non-nullable: The value of the primary feature associated with this
observationvalue
, (type of feature namespace’s value): The observed value of the feature.value_json
, string, DEPRECATED: The observed value of the feature, as JSON, for backwards compatibility.observed_at
, timestamp, non-nullable: The time at which the feature value was observed.inserted_at
, timestamp, non-nullable: The time at which the feature value was inserted into the offline store.resolver_fqn
, string, nullable: The resolver that computed this feature.deployment_id
, string, nullable: The deployment id where the observation was madeSchema for feature_namespace
:
__pkey__
: (type of feature namespace’s pkey), non-nullable: The value of the primary feature associated with this row<feature-name>
, (type of feature), for each feature: The freshest observed value of the feature for the primary key.<feature-name>.__observed_at__
, timestamp, for each feature: The time at which the freshest value was observed.chalk.query_log
: A schema provider consisting of one table, data
, which is the query log table.
This table provides a list of query metadata for each operation id, including which tables in the offline store
correspond to values queried in that operation id.
Schema for data
:
operation_id
, string, non-nullable: The unique identifier for the operation.environment_id
, string, non-nullable: The environment id where the operation was run.deployment_id
, string: The deployment id where the operation was run.operation_kind
, string, The kind of operation that was run (inference, streaming, etc.).query_timestamp
, timestamp: The value of now()
that was used by the query.execution_started_at
, timestamp: The time at which the operation started executing. Null if there were errors during planning.execution_finished_at
, timestamp: The time at which the operation finished executing. Null if there were errors during execution.query_status
, string: The status of the query (success, error, running, etc.).query_name
, string: The query name for named queries, null otherwise.query_name_version
, string: The version of the query for named queries, null otherwise.branch_name
: string: The branch the query was run on, null if the query was not run on a branch.correlation_id
, string: The correlation id for the query, null if the query was not run with a correlation id.value_tables
: array of strings: The list of tables in the offline store that correspond to result values from this query.chalk.resolvers
: Trigger a resolver run and return the outputs of a specific resolver, with additional filtering
based on the SQL query provided. A table exists for every eligible resolver. To run a resolver this way, it must not
take any inputs and return a dataframe consisting of only scalar outputs.
Schema: Exactly that of the resolver.
Chalk also offers a chalk.information_schema
that you can query (similar to many other database offerings), and queries such as SHOW TABLES
and SHOW SCHEMAS
can also be used.
If you had the following feature class:
@features
class User:
id: int
name: str
You could query observed feature values for User.name
with user.id = 1
by running the following query:
SELECT value FROM "chalk.historical_values.user.name" WHERE pkey = 1