Model Platform
Learn how to deploy and manage machine learning models in Chalk
With Chalk, you can deploy machine learning models as isolated services running in dedicated scaling groups. This approach allows your models to run with their own compute resources, auto-scaling policies, and independent lifecycle management—separate from the Chalk engine itself.
This is different from the traditional approach of including models directly in Chalk feature resolvers. Instead of embedding model inference within your feature computation, model deployments host your models as standalone services that can be called from resolvers or external applications.
Model deployments are ideal when you want to:
@model_handler decoratorThe fastest way to deploy a model is the @model_handler decorator. You write a single class
with a predict method, hand a trained model object to register_model_version, and Chalk
builds the serving image for you, ships your class as source, serializes and mounts the model,
and wires up the runtime — no Dockerfile, no hand-written Arrow plumbing, no chalkcompute.Image.
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from chalk.client import ChalkClient
from chalk.ml import model_handler
from chalk.scalinggroup import ScalingGroupResourceRequest
@model_handler
class HousePriceModel:
def predict(self, df):
return pd.DataFrame({"price": self.model.predict(df.to_pandas())})
rng = np.random.default_rng(0)
X_train = rng.normal(size=(200, 2)) # columns: sqft, rooms
y_train = X_train @ [150.0, 50.0] # price
rf = RandomForestRegressor().fit(X_train, y_train)
client = ChalkClient()
result = client.register_model_version(
name="house_price",
model=HousePriceModel(model=rf),
input_schema={"sqft": float, "rooms": float},
output_schema={"price": float},
dependencies=["scikit-learn", "pandas", "chalkdf"],
)
client.deploy_model_version_to_scaling_group(
name=f"house-price-{result.model_version}",
model_name="house_price",
model_version=result.model_version,
resources=ScalingGroupResourceRequest(cpu="1", memory="2Gi"),
)Pass input_schema, output_schema, and dependencies explicitly. Schemas are
required — naming your columns is what makes the output line up with what
predict returns — and dependencies is the list of pip packages your predict
needs at runtime.
Schema values can be plain Python types — float, int, str, and bool map to
pa.float64(), pa.int64(), pa.string(), and pa.bool_() respectively, so you don’t
need to import pyarrow for ordinary tabular models. Reach for a PyArrow type only when
you need one outside those four (for example pa.float32(), pa.large_string(), or a
timestamp type).
The decorated class is a normal Python class with three Chalk-managed attributes injected:
model — your trained model object. Chalk serializes it with the framework’s native
serializer at registration time, uploads it to the model’s artifact volume, and deserializes
it back into self.model inside the container before predict runs. The same attribute is
available on both sides.files — a list of local file paths (files=["./scaler.pkl"]) that Chalk uploads to the
artifact volume. At runtime self.files is a {basename: Path} mapping, so
self.files["scaler.pkl"] resolves to the mounted path in the container (and to your local
path when testing). Use this for tokenizers, scalers, encoders, lookup tables, etc.artifact_path — the mounted artifact volume directory.predict and return typespredict(self, df) receives a chalkdf.DataFrame built from the request. Call
df.to_pandas() (or df.to_arrow()) to get the shape your model expects. The return value can
be any of the following — Chalk coerces it to the output columns for you:
| Return type | Becomes |
|---|---|
pandas.DataFrame | columns by name |
polars.DataFrame / chalkdf.DataFrame | columns by name |
pyarrow.RecordBatch / pyarrow.Table | columns by name |
numpy.ndarray (1-D) | a single prediction column |
numpy.ndarray (2-D) | col_0, col_1, … columns |
Returning a named frame (pandas/polars/Arrow) is recommended so your output columns are explicit
and match your output_schema.
The model= object can be any of: scikit-learn, PyTorch, XGBoost, LightGBM,
CatBoost, TensorFlow/Keras, or ONNX. The framework is auto-detected and the right
serializer is used.
load_modelFor models Chalk can’t serialize (a custom Python class, a sentence encoder, a lookup table),
leave model=None, ship the artifacts via files=[...], and own the loading in load_model,
which runs once per replica at startup:
@model_handler
class CategoryEnricher:
def load_model(self):
import pickle
with open(self.files["table.pkl"], "rb") as f:
self.table = pickle.load(f)
def predict(self, df):
ids = df.to_pandas()["category_id"]
return pd.DataFrame({"name": [self.table.get(i, "unknown") for i in ids]})When you do have a model= object and also need extra setup (device placement, .eval(),
auxiliary files), call self.default_load_model() inside your override to populate self.model
the default way, then add your own steps:
@model_handler
class UserEmbedder:
def load_model(self):
self.default_load_model() # restores self.model
self.model.to("cuda").eval()Because the decorated class is plain Python, you can exercise it locally with no container
plumbing — construct it, call load_model() (a no-op outside the container when model= is
already set), and call predict with a chalkdf.DataFrame:
import pyarrow as pa
from chalkdf import DataFrame
m = HousePriceModel(model=rf)
m.load_model()
out = m.predict(DataFrame.from_arrow(pa.RecordBatch.from_pydict({"sqft": [1000.0], "rooms": [3.0]})))The `predict` path hands your code a `chalkdf.DataFrame`, so the serving image installs `chalkdf`, which requires Python below 3.13 (Chalk pins the image to 3.12 automatically). The container also installs the same `chalkpy` version you registered from, so the version you run locally must be a released version that includes `@model_handler` `predict` support.
Working @model_handler examples for each supported framework. The pattern is identical — write
predict, hand over a trained model, register, deploy — but note that the object you get back as
self.model in the container is whatever that framework’s loader returns, which is often not
the class you trained (see the comments in each snippet). Switch tabs to see each one:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from chalk.client import ChalkClient
from chalk.ml import model_handler
from chalk.scalinggroup import ScalingGroupResourceRequest
@model_handler
class SklearnRF:
def predict(self, df):
X = df.to_pandas()[["f0", "f1", "f2", "f3"]]
return pd.DataFrame({"prediction": self.model.predict(X)})
rng = np.random.default_rng(0)
X_train = rng.normal(size=(200, 4))
y_train = X_train @ [3.0, -2.0, 1.0, 0.5]
rf = RandomForestRegressor().fit(X_train, y_train)
client = ChalkClient()
v = client.register_model_version(
name="sklearn_rf",
model=SklearnRF(model=rf),
input_schema={f"f{i}": float for i in range(4)},
output_schema={"prediction": float},
dependencies=["scikit-learn", "pandas", "chalkdf"],
)
client.deploy_model_version_to_scaling_group(
name=f"sklearn-rf-{v.model_version}", model_name="sklearn_rf", model_version=v.model_version,
resources=ScalingGroupResourceRequest(cpu="1", memory="2Gi"),
)When you need full control over the serving container — a custom inference handler, extra system
packages, or a pre-built image — register a model with a container image instead of a decorated
class or a Python model object. Every model image runs the
chalk-remote-call-python shim, which routes
requests and handles PyArrow serialization. You supply a handler and an entrypoint that
points chalk-remote-call at it, then register either a chalkcompute.Image (Chalk builds it) or a
pre-built Docker image (you build it).
The handler receives a dict of PyArrow Arrays — one per input_schema column — and returns a
PyArrow Array. Optionally define on_startup to load resources once when the container starts;
artifacts mounted via a volume live at /app/artifacts/ (see
Automatic Volume Upload for Model Artifacts).
import json
import pyarrow as pa
import pyarrow.compute as pc
model = None
def on_startup():
global model
with open("/app/artifacts/model.json") as f:
model = json.load(f)
def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
factor = model["factor"]
return pc.multiply(event["x"], pa.scalar(factor, type=pa.float64()))Point the entrypoint at the handler (and optional startup hook):
chalk-remote-call --handler model.handler --on-startup model.on_startup --port 8080
Pass a chalkcompute.Image and Chalk builds and manages the container for you. Install
chalk-remote-call-python, add your handler file, and set chalk-remote-call as the entrypoint:
from chalk.client import ChalkClient
from chalkcompute import Image
client = ChalkClient()
image = (
Image.debian_slim("3.11")
.pip_install(["chalk-remote-call-python", "pyarrow"])
.add_local_file("model.py", "/app/model.py", strategy="copy")
.env({"PYTHONPATH": "/app"})
.workdir("/app")
.entrypoint(["chalk-remote-call", "--handler", "model.handler", "--port", "8080"])
)
client.register_model_version(
name="my-model",
input_schema={"x": float},
output_schema={"y": float},
model_image=image,
)Useful chalkcompute.Image methods:
.base(image) — use a custom base Docker image.debian_slim(python_version) — slim Debian base with the given Python version.pip_install(packages) — install Python packages.run_commands(commands) — run shell commands during the build.add_local_file(src, dest, strategy) / .add_local_dir(src, dest, strategy) — copy files in.env(vars) / .workdir(path) / .entrypoint(command) — set env vars, working dir, entrypointIf you build and push the image yourself, register it by string reference. The image must meet the
same requirements: install chalk-remote-call-python, define a handler (and optional on_startup),
and run chalk-remote-call as the entrypoint.
client.register_model_version(
name="my-model",
input_schema={"x": float},
output_schema={"y": float},
model_image="my-model-image:latest",
)FROM python:3.11-slim
WORKDIR /app
RUN pip install --no-cache-dir chalk-remote-call-python pyarrow
COPY model.py /app/model.py
ENV PYTHONPATH=/app
EXPOSE 8080
ENTRYPOINT ["chalk-remote-call", "--handler", "model.handler", "--port", "8080"]docker build --platform linux/amd64 -t my-model:latest .
docker push my-model:latestWhen your model files are large (e.g. multi-gigabyte weight files), baking them into the container image is impractical—it slows down builds, increases image pull times, and wastes storage. Instead, Chalk automatically uploads model artifacts to a volume that gets mounted into your container at runtime. If your model artifacts are already baked into the image and you want to skip this automatic upload, pass skip_upload_to_volumes=True:
client.deploy_model_version_to_scaling_group(
name="my-large-model-sg",
model_name="my-large-model",
model_version=response.model_version,
handler="handler.handler",
skip_upload_to_volumes=True,
resources=ScalingGroupResourceRequest(cpu="1", memory="2Gi"),
)The uploaded artifacts are mounted at /app/artifacts/ inside the container. Load them in
on_startup exactly as shown in Writing a handler — open
/app/artifacts/model.json once at startup and reference it from your handler.
Once registered, deploy a model version to a scaling group with resource specifications and auto-scaling policies.
from chalk.client import ChalkClient
from chalk.scalinggroup import AutoScalingSpec, ScalingGroupResourceRequest
client = ChalkClient()
# Deploy the model version to a scaling group
client.deploy_model_version_to_scaling_group(
name="my-model-sg",
model_name="my-model",
model_version=1,
handler="model.handler",
scaling=AutoScalingSpec(
min_replicas=1,
max_replicas=2,
target_cpu_utilization_percentage=70,
),
resources=ScalingGroupResourceRequest(
cpu="2",
memory="4Gi",
),
)Control how your model deployment scales based on demand using AutoScalingSpec.
from chalk.scalinggroup import AutoScalingSpec
# Configure auto-scaling behavior
scaling = AutoScalingSpec(
min_replicas=1, # Minimum number of replicas
max_replicas=5, # Maximum number of replicas
target_cpu_utilization_percentage=70, # Target CPU utilization (optional)
)Chalk automatically scales the number of replicas based on inference request load and CPU utilization, staying within your min/max bounds. This ensures your models handle traffic spikes efficiently without wasting resources during quiet periods.
Specify CPU, memory, and GPU resources for each replica of your model using ScalingGroupResourceRequest.
from chalk.scalinggroup import ScalingGroupResourceRequest
# Request resources per replica
resources = ScalingGroupResourceRequest(
cpu="2", # CPU allocation per replica
memory="4Gi", # Memory allocation per replica
gpu="nvidia-tesla-t4:1", # Optional: GPU type and count
)Each replica gets the specified resources. When Chalk scales from 1 to 3 replicas, total resource usage is multiplied accordingly (e.g., 3 replicas × 2 CPU = 6 CPU total).
A deployed model is addressed by the scaling group name you passed to
deploy_model_version_to_scaling_group, referenced as model.{scaling_group_name}. You can
call it two ways. In every case the argument order must match your model’s input_schema.
F.catalog_callCall the model as part of feature computation, so its output is just another feature:
from chalk.features import features, _
from chalk import functions as F
@features
class MyModel:
id: int
x_1: float
x_2: float
y: float = F.catalog_call(
"model.my-model-sg",
_.x_1,
_.x_2,
)F.catalog_call is evaluated during chalk query, so the call only takes effect once the
feature graph is applied to your environment with chalk apply.
In a SQL resolver, invoke the model with the catalog_call('model.{scaling_group_name}', ...)
function, passing the model’s qualified name as the first argument:
select
id,
catalog_call('model.my-model-sg', x_1, x_2) as y
from my_tableA deployed model only becomes available to SQL after you **redeploy** your Chalk deployment with `chalk apply`. Registering the model and deploying it to a scaling group is not enough on its own — the redeploy is what makes the model's qualified name resolvable by `catalog_call`. If you get an "unknown function" or unresolved-name error in a SQL resolver right after deploying a model, run `chalk apply` and try again.
deploy_model_version_to_scaling_group always runs the image with the chalk-remote-call
entrypoint (Chalk’s Arrow request/response runtime), so you can’t deploy a stock
vLLM image directly — its own OpenAI HTTP server never starts. Instead,
put the Chalk shim in front of vLLM: ship a handler that loads vLLM at startup and answers
each Arrow request by generating with it. The model then deploys through the normal
register-model + deploy-to-scaling-group path and is called like any other Chalk model
(F.catalog_call or SQL). This is exactly how curated models are built.
import os
import pyarrow as pa
_llm = None
def on_startup():
global _llm
from vllm import LLM
_llm = LLM(model=os.environ.get("VLLM_MODEL", "Qwen/Qwen2.5-0.5B-Instruct"))
def handler(event, context):
from vllm import SamplingParams
rb = pa.Table.from_pydict(event).combine_chunks().to_batches()[0]
prompts = rb.column("prompt").to_pylist()
outs = _llm.generate(prompts, SamplingParams(max_tokens=64))
texts = [o.outputs[0].text for o in outs]
return {"completion": pa.array(texts, type=pa.large_string())}import pyarrow as pa
from chalkcompute import Image
from chalk.client import ChalkClient
from chalk.scalinggroup import AutoScalingSpec, ScalingGroupResourceRequest
# vLLM base + the chalk-remote-call runtime + our handler.
image = (
Image.base("vllm/vllm-openai:latest")
.pip_install(["chalk-remote-call-python", "pyarrow"])
.add_local_file("handler.py", "/app/handler.py", strategy="copy")
.workdir("/app")
.env({"PYTHONPATH": "/app"})
)
client = ChalkClient()
result = client.register_model_version(
name="vllm-qwen",
model_image=image,
input_schema={"prompt": pa.large_string()},
output_schema={"completion": pa.large_string()},
)
client.deploy_model_version_to_scaling_group(
name="vllm-qwen-sg",
model_name="vllm-qwen",
model_version=result.model_version,
scaling=AutoScalingSpec(min_replicas=1, max_replicas=2),
resources=ScalingGroupResourceRequest(gpu="nvidia-tesla-t4:1"), # vLLM needs a GPU
handler="handler.handler",
env_vars={"PYTHONPATH": "/app"},
)Then call it like any Chalk model — over the Arrow contract, not raw HTTP. For example, from a feature resolver:
from chalk.features import features, _
from chalk import functions as F
@features
class Doc:
id: int
prompt: str
completion: str = F.catalog_call("model.vllm-qwen-sg", _.prompt)**Curated models** are Chalk's prepackaged version of this idea: a maintained catalog of open-weight models (e.g. `gemma-3-4b-it`, `mistral-7b-instruct`, `qwen3-embedding-0-6b`, `chronos-2`) that serve the same Arrow contract and deploy with one call — no image to build or maintain. They are not built on this vLLM-in-a-Python-handler recipe, though: the text models serve through Chalk's native Rust `chalk_model_runtime`, with `-cpu` and `-gpu` image variants selected at deploy time. The recipe above is how you serve your own vLLM model when a curated one doesn't fit.
If you instead want vLLM's raw OpenAI-compatible HTTP API — to point an OpenAI client straight at it — deploy it as a `chalkcompute.Container`, which runs the image's own entrypoint and exposes the HTTP endpoint directly. See [Model Inference with vLLM](/docs/compute/model-inference).
Deploy a new version of a model to an existing scaling group:
from chalkcompute import Image
from chalk.scalinggroup import ScalingGroupResourceRequest
# Register a new model version with an updated Chalk image
new_version = client.register_model_version(
name="my-model",
input_schema={"x": float},
output_schema={"y": float},
model_image=(
Image.debian_slim("3.11")
.pip_install(["chalk-remote-call-python", "pyarrow"])
.add_local_file("model_v2.py", "/app/model.py", strategy="copy")
.env({"PYTHONPATH": "/app"})
.workdir("/app")
.entrypoint(
[
"chalk-remote-call",
"--handler",
"model.handler",
"--port",
"8080",
]
)
),
)
# Update the scaling group with the new version
client.deploy_model_version_to_scaling_group(
name="my-model-sg",
model_name="my-model",
model_version=new_version.model_version,
handler="model.handler",
resources=ScalingGroupResourceRequest(cpu="1", memory="2Gi"),
)For more information on listing, inspecting, and deleting scaling groups, see the Scaling Groups page.
Model registration and deployment should be controlled manually and separately from your feature definitions. Either:
.chalkignore to prevent them from running during chalk apply.Your chalk apply will fail if it tries to run model registration and deployment code.
Organize your project to keep model management separate from feature definitions:
my-chalk-project/
|- models/ # Model deployment code (add to .chalkignore)
| |- model.py
| `- deploy_model.py # Registration + deployment script
|
|- features/ # Feature definitions (synced with chalk apply)
| |- __init__.py
| `- user_features.py
|
|- .chalkignore
`- chalk.yamlPut the following line in your .chalkignore so chalk apply skips everything under models/.
models/