Developer Interface
If you need to compute the inputs to a query dynamically, you can use ChalkSQL. This can improve performance and reduce cost by avoiding extra round trips across the network.
input_sql= as a SQL string, instead of input=.__ts__ column to specify a query time.For comparison, let’s start with a simple query that does not use SQL:
grpc_client = ChalkGRPCClient(...)
grpc_client.online_query_bulk(
input={
User.id: [1, 2],
User.email: ["alice@example.com", "bob@example.com"],
},
output=[
User.id,
User.email,
User.is_fraud,
],
)This query says we know the User.id and email, and we want to know is_fraud.
The known values (inputs) are passed as literal data: 1 or "alice@example.com".
Because we’re using the _bulk method, we can pass many input examples in one query.
Conceptually the input is a table, where each column is a feature, and each row is a different example:
┌─────────┬───────────────────┐
│ user.id │ user.email │
│ int32 │ varchar │
├─────────┼───────────────────┤
│ 1 │ alice@example.com │
│ 2 │ bob@example.com │
└─────────┴───────────────────┘
In any SQL implementation (not only ChalkSQL), you can build the same table
with VALUES:
select *
from (values (1, 'alice@example.com'), (2, 'bob@example.com'))
as t("user.id", "user.email");
Note the as t(...) alias, which renames the columns to match our features’ fully-qualified names (FQNs).
You can plug this SQL query into the Chalk query by replacing
the input argument with input_sql:
grpc_client = ChalkGRPCClient(...)
grpc_client.online_query_bulk(
input_sql="""
select *
from (values (1, 'alice@example.com'), (2, 'bob@example.com'))
as t("user.id", "user.email");
""",
output=[
User.id,
User.email,
User.is_fraud,
],
)This returns the same result as the original query with hardcoded inputs. But now we can replace the SQL with something more dynamic, such as a data source.
For example, to get the IDs of all users whose last name starts with “S”:
select
id as "user.id",
email_address as "user.email"
from my_postgres.users
where last_name like "S%"(Note the as aliases, which rename each column to exactly match the feature FQN.)
The overall Chalk query, using this SQL as its input, would look like this:
grpc_client = ChalkGRPCClient(...)
grpc_client.online_query_bulk(
input_sql="""
select
id as "user.id",
email_address as "user.email"
from my_postgres.users
where last_name like "S%"
""",
output=[
User.id,
User.email,
User.is_fraud,
],
)This is the same as the previous example: the only difference is that it uses ChalkClient.offline_query instead of ChalkGRPCClient.online_query_bulk.
client = ChalkClient(...)
client.offline_query(
input_sql="""
select
id as "user.id",
email_address as "user.email"
from my_postgres.users
where last_name like "S%"
""",
output=[
User.id,
User.email,
User.is_fraud,
],
recompute_features=True,
)Normally, online and offline queries let you specify a query time
by passing an input_times or now argument.
This time value is not a feature value, but is part of the overall input to the query.
For example, this query specifies values for one input feature, and also a query time:
grpc_client = ChalkGRPCClient(...)
grpc_client.online_query_bulk(
input={
Thermometer.id: [2, 3],
},
now=[datetime(2025, 1, 2, 12, 0, 0), datetime(2025, 1, 2, 12, 0, 0)],
output=[
Thermometer.id,
Thermometer.measured_at,
Thermometer.degrees_fahrenheit,
],
)Conceptually, the overall input to the query is this table:
┌────────────────┬─────────────────────┐
│ thermometer.id │ __ts__ │
│ int32 │ timestamp │
├────────────────┼─────────────────────┤
│ 2 │ 2025-01-02 00:00:00 │
│ 3 │ 2025-01-02 00:00:00 │
└────────────────┴─────────────────────┘
So to get the same behavior with SQL as the input, you would write a SQL query that outputs
two columns: thermometer.id and __ts__:
grpc_client = ChalkGRPCClient(...)
grpc_client.online_query_bulk(
input_sql=f"""
select *
from (values
(2, '2025-1-2 12:00:00'::datetime),
(3, '2025-1-2 12:00:00'::datetime)
)
as t("thermometer.id", "__ts__")
""",
output=[
Thermometer.id,
Thermometer.measured_at,
Thermometer.degrees_fahrenheit,
],
)And the syntax for offline query follows the same pattern:
client = ChalkClient(...)
client.offline_query(
input_sql=f"""
select *
from (values
(2, '2025-1-2 12:00:00'::datetime),
(3, '2025-1-2 12:00:00'::datetime)
)
as t("thermometer.id", "__ts__")
""",
output=[
Thermometer.id,
Thermometer.measured_at,
Thermometer.degrees_fahrenheit,
],
recompute_features=True,
)Versioned features work as usual by appending a “@N” suffix to the fully-qualified name:
grpc_client.online_query_bulk(
input_sql="""
select
id as "user.id",
email_address as "user.email@2"
from my_postgres.users
where last_name like "S%"
""",
output=[
User.id,
User.email @ 2,
User.is_fraud,
],
)You may also find it convenient to use str() to access the fully-qualified name instead
of hardcoding it:
input_sql=f"""
select
id as "{str(User.id)}",
email_address as "{str(User.email @ 2)}",
from ...
""",