Developer Interface
This page summarizes some workflows that are commonly used in Chalk SQL.
In Chalk SQL, you can query named datasets within the chalk.datasets schema. Note that only datasets created within
the last 60 days will be available to Chalk SQL.
from chalk.client import ChalkClient
client = ChalkClient()
client.offline_query(
input={
"user.id": [i for i in range(1, 10)]
},
output=["user.name", "user.sum_amt_txns", "user.cnt_txns"],
recompute_features=True,
dataset_name="my_users_dataset"
).get_data_as_polars()SELECT
"user.id" AS id,
"user.sum_txns" / "user.cnt_txns" AS avg_txn_amt,
"__ts__" AS created_at
FROM chalk.datasets.my_user_dataset
WHERE "user.id" >= 5 AND "user.id" <= 8;If you define named data sources in your Chalk environment, you can reference schemas and tables within them in Chalk SQL queries. This means Chalk SQL can be used to create queries that join across these data sources.
# In your Chalk project
from chalk.sql import PostgreSQLSource, BigquerySource
pg = PostgreSQLSource(name="my_postgres")
bq = PostgreSQLSource(name="my_bigquery")WITH txn_sum AS (
SELECT
user_id,
SUM(amount) AS total
FROM my_bigquery.my_dataset.transactions -- Table "transactions" in dataset "my_dataset"
GROUP BY user_id
)
SELECT
u.id AS user_id,
u.name AS user_name,
t.total AS user_spending
FROM my_postgres.my_schema.users u -- Table "users" in schema "my_schema"
JOIN txn_sum t
ON u.id = t.user_id;When making queries to Chalk with persistence turned on, these values will be ingested into the offline store, where they can be re-used for analytical queries. Chalk SQL provides many useful views into the offline store.
The query log, which is accessible via the chalk.query_log.data table, consists of information about each query
It can be hard to determine the table in the table within in the offline store that corresponds
to the results of the query. Chalk SQL provides a table-valued function query_values_from_operation_ids()
that performs this resolution in-house.
from chalk.client import ChalkClient
client = ChalkClient()
result = client.query_bulk(
input={
"user.id": [i for i in range(1, 100)]
},
output=["user.name", "user.favorite_color"],
)
# In the UI or in the query meta information, an operation id will be set.SELECT *
FROM query_values_from_operation_ids('c012345678901234567890123')
WHERE "user.id" = 1You can also directly query the offline store via the offline_store catalog.
SELECT *
FROM offline_store.schema_name.feat_0123456789;