Integrations
Integrate any API, 3rd-party client or data source without needing to orchestrate data pipelines
Chalk integrates seamlessly with your underlying systems—querying your data sources directly, eliminating the need for ETL!
This unlocks several key benefits:
Anywhere that you can run Kubernetes, you can run Chalk—Chalk is cloud-agnostic.
Chalk deploys into your VPC co-located with your data sources for the lowest latency and cost. Multi-cloud deployments for high availability and disaster recovery.
Chalk has native drivers and integrations with a variety of SQL data sources and query engines, and provide a unified interface for adding new data sources. Adding a new SQL source is as simple as providing a connection string and a few configuration options through your Chalk dashboard. Once it’s been added to your Chalk deployment, you can start querying it right away with SQL Resolvers.
-- resolves: User
-- source: postgres
select
id,
name,
from users
The features in a feature class can be hydrated from multiple SQL sources—we can pull a user’s social security number from a different database that has stricter access controls.
-- resolves: User
-- source: restricted_postgres
select
id,
ssn
from sensitive_user_data
In addition, Chalk can reverse ETL features from your data warehouses into Chalk’s online store for low-latency access. Chalk integrates natively (C++ integration) with the following data sources and pushes down filters and projections into SQL queries for more efficient data fetching.
Data Warehouses
Native:
AWS:
GCP:
Azure:
We provide stream resolvers for integrating Kafka compatible systems data sources.
Streams can also be filtered, processed, and materialized as a step in Chalk’s feature computation pipelines.
@stream(source=KafkaSource(name='transactions_stream'))
def process_transaction_topic(
value: TransactionMsg,
) -> Features[Transaction.id, Transaction.user_id, Transaction.amount]:
return Transaction(
id=value.id,
user_id=value.user_id,
amount=value.amount,
)Chalk makes it easy to cache features for low-latency access with the max_staleness keyword
argument. These features skip expensive API calls and are fetched from the online store.
@feature
class User:
id: int
name: str
ssn: int
credit_score: int = feature(max_staleness="30d")We support a variety of caching backends:
Call internal APIs, third-party services, and microservices with built-in retry logic and circuit breakers:
@online
def get_credit_score(ssn: User.ssn) -> User.credit_score:
response = requests.get(
f"https://api.creditbureau.com/score/{ssn}",
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=2.0
)
return response.json()["score"]Chalk’s Symbolic Python Interpreter supports accelerating libraries like requests, and so this function gets run in C++.
AWS (Amazon Web Services)
GCP (Google Cloud)
Microsoft
Chalk is Iceberg native and can write to your underlying object storage and catalog directly from offline queries.
from chalk.integrations import GlueCatalog
catalog = GlueCatalog(
name="aws_glue_catalog",
aws_region="us-west-2",
catalog_id="123",
aws_role_arn="arn:aws:iam::123456789012:role/YourCatalogueAccessRole",
)
results.write_to(destination="database.table_name", catalog=catalog)Access traditional machine learning functions like Scikit, XGBoost, and your own models directly within feature definitions using Chalk Expressions:
Integrating unstructured data with LLMS (large language models) or computing embeddings is straightforward with Chalk’s built-in integrations. Easily conduct Evals, switch out different models and providers, and reference the features you need in your prompts without having to configure complex pipelines.
You can override the base url and API key to connect to any OpenAI compatible endpoint.
@features
class Item:
id: int
title: str
description: str
llm: P.PromptResponse = P.completion(
model="gpt-4o-mini-2024-07-18",
messages=[
P.message(
role="user",
content=F.jinja(
"""
Classify the following item category using its title and description:
Item title: {{ Item.title }}
Item description: {{ Item.description }}
""",
),
),
],
output_structure=StructuredOutput,
)You can just as easily compute embeddings for items, users, or any other entity using built-in integrations:
@features
class VectorSearch:
q: Primary[str]
# from chalk.features import embed
vector: Vector = embed(
input=lambda: VectorSearch.q,
provider="vertexai",
model="text-embedding-005",
)
query_type: str = "vector"
results: "DataFrame[ItemDocument]"With dozens of native integrations across cloud platforms, databases, streaming systems, caching layers, and AI services, Chalk eliminates the complexity of building and maintaining production machine learning systems.
Whether you’re pulling user data from PostgreSQL, processing real-time events from Kafka, caching expensive feature computations in Redis, or extracting features from unstructured data with LLM’s-—Chalk’s unified platform handles it all.
The result? Faster time to production, lower operational overhead, and consistent feature logic across your entire ML stack.