1. Model Platform
  2. Model Deployments

With Chalk, you can deploy machine learning models as isolated services running in dedicated scaling groups. This approach allows your models to run with their own compute resources, auto-scaling policies, and independent lifecycle management—separate from the Chalk engine itself.

This is different from the traditional approach of including models directly in Chalk feature resolvers. Instead of embedding model inference within your feature computation, model deployments host your models as standalone services that can be called from resolvers or external applications.


When to Use Model Deployments

Model deployments are ideal when you want to:

  • Isolate model resources: Give models their own CPU, memory, and GPU resources independent of the engine
  • Scale models independently: Auto-scale models based on inference demand without affecting other services
  • Version and update models separately: Deploy new model versions without redeploying your entire Chalk system
  • Run containerized models: Deploy models as Chalk images or Docker images without converting to Python objects
  • Enable high-throughput inference: Run multiple replicas of your model in parallel

Registering Models for Deployment

To deploy models to scaling groups, register them with a container image instead of local model files or Python model objects. You can either provide a chalkcompute.Image object and let Chalk build the image for you, or reference a pre-built Docker image directly.

Using a Chalk Image

With a chalkcompute.Image, you define your image configuration in Python and Chalk handles building and managing the container image:

from chalk.client import ChalkClient
from chalkcompute import Image
import pyarrow as pa

client = ChalkClient()

image = (
    Image.debian_slim("3.11")
    .pip_install(["chalk-remote-call-python", "pyarrow"])
    .add_local_file("model.py", "/app/model.py", strategy="copy")
    .env({"PYTHONPATH": "/app"})
    .workdir("/app")
    .entrypoint(
        [
            "chalk-remote-call",
            "--handler",
            "model.handler",
            "--port",
            "8080",
        ]
    )
)

client.register_model_version(
    name="my-model",
    input_schema={"x": pa.float64()},
    output_schema={"y": pa.float64()},
    model_image=image,
)

Using a Docker Image

Alternatively, you can register a pre-built Docker image by passing a string reference:

from chalk.client import ChalkClient
import pyarrow as pa

client = ChalkClient()

client.register_model_version(
    name="my-model",
    input_schema={"x": pa.float64()},
    output_schema={"y": pa.float64()},
    model_image="my-model-image:latest",
)

See the Docker Image Requirements section below for details on building compatible images.


Mounting Large Models to a Volume

When your model files are large (e.g. multi-gigabyte weight files), baking them into the container image is impractical—it slows down builds, increases image pull times, and wastes storage. Instead, you can upload model artifacts to a volume that gets mounted into your container at runtime.

After registering the model version, use upload_model_to_volume to upload your files. You must use chalk_handler_volume_name to format the volume name—the deploy path uses this deterministic name to find and mount the volume. In this example, the model artifacts are stored in model.json:

from chalk.client import ChalkClient
from chalk.client.model_image import chalk_handler_volume_name, upload_model_to_volume
from chalkcompute import Image
import pyarrow as pa

client = ChalkClient()

image = (
    Image.debian_slim("3.11")
    .pip_install(["chalk-remote-call-python", "joblib"])
    .add_local_file("handler.py", "/app/handler.py", strategy="copy")
    .env({"PYTHONPATH": "/app"})
    .workdir("/app")
    .entrypoint(
        [
            "chalk-remote-call",
            "--handler",
            "handler.handler",
            "--on-startup",
            "handler.on_startup",
            "--port",
            "8080",
        ]
    )
)

response = client.register_model_version(
    name="my-large-model",
    input_schema={"x": pa.float64()},
    output_schema={"y": pa.float64()},
    model_image=image,
)

# Upload model files to a volume
upload_model_to_volume(
    volume_name=chalk_handler_volume_name("my-large-model", response.model_version),
    model_filename="model.json",
    model_file_path="./model.json",
    chalk_client=client,
)

Loading Mounted Models in Your Handler

The uploaded artifacts are mounted at /app/artifacts/ inside the container. Use on_startup to load the model once when the container starts:

import json

import pyarrow as pa
import pyarrow.compute as pc

model = None

def on_startup():
    global model
    with open("/app/artifacts/model.json") as f:
        model = json.load(f)


def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
    factor = model["factor"]
    return pc.multiply(event["x"], pa.scalar(factor, type=pa.float64()))

Writing a Handler

Your handler is the function that runs inference. It receives a dictionary of PyArrow Arrays and returns a PyArrow Array.

import pyarrow as pa
import pyarrow.compute as pc

def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
    return pc.multiply(event["x"], pa.scalar(2.0, type=pa.float64()))

Optional Startup Hook

Define an on_startup function to initialize resources before serving requests, and pass it via --on-startup in your entrypoint:

model = None

def on_startup():
    global model
    with open("/app/artifacts/model.json") as f:
        model = json.load(f)

def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
    factor = model["factor"]
    return pc.multiply(event["x"], pa.scalar(factor, type=pa.float64()))
chalk-remote-call --handler model.handler --on-startup model.on_startup --port 8080

Deploying to Scaling Groups

Once registered, deploy a model version to a scaling group with resource specifications and auto-scaling policies.

from chalk.client import ChalkClient
from chalk.scalinggroup import AutoScalingSpec, ScalingGroupResourceRequest

client = ChalkClient()

# Deploy the model version to a scaling group
client.deploy_model_version_to_scaling_group(
    name="my-model-sg",
    model_name="my-model",
    model_version=1,
    handler="model.handler",
    scaling=AutoScalingSpec(
        min_replicas=1,
        max_replicas=2,
        target_cpu_utilization_percentage=70,
    ),
    resources=ScalingGroupResourceRequest(
        cpu="2",
        memory="4Gi",
    ),
)

Auto-Scaling Configuration

Control how your model deployment scales based on demand using AutoScalingSpec.

from chalk.scalinggroup import AutoScalingSpec

# Configure auto-scaling behavior
scaling = AutoScalingSpec(
    min_replicas=1,                          # Minimum number of replicas
    max_replicas=5,                          # Maximum number of replicas
    target_cpu_utilization_percentage=70,    # Target CPU utilization (optional)
)

Chalk automatically scales the number of replicas based on inference request load and CPU utilization, staying within your min/max bounds. This ensures your models handle traffic spikes efficiently without wasting resources during quiet periods.


Resource Configuration

Specify CPU, memory, and GPU resources for each replica of your model using ScalingGroupResourceRequest.

from chalk.scalinggroup import ScalingGroupResourceRequest

# Request resources per replica
resources = ScalingGroupResourceRequest(
    cpu="2",                          # CPU allocation per replica
    memory="4Gi",                     # Memory allocation per replica
    gpu="nvidia-tesla-t4:1",          # Optional: GPU type and count
)

Each replica gets the specified resources. When Chalk scales from 1 to 3 replicas, total resource usage is multiplied accordingly (e.g., 3 replicas × 2 CPU = 6 CPU total).


Calling Deployed Models

Models deployed to scaling groups can be called from Chalk feature resolvers using the catalog_call function with the scaling group name.

from chalk.features import features, _
from chalk import functions as F

@features
class MyModel:
    id: int
    x: float
    y: float = F.catalog_call(
        "model.my-model-sg",
        _.x
    )

The catalog call format is: model.{scaling_group_name}

You can pass multiple inputs by providing them as additional arguments:

@features
class MyModel:
    id: int
    x_1: float
    x_2: float
    y: float = F.catalog_call(
        "model.my-model-sg",
        _.x_1,
        _.x_2
    )

The order of arguments must match the order of fields in your model’s input_schema.


Managing Deployments

Updating a Deployment

Deploy a new version of a model to an existing scaling group:

from chalkcompute import Image

# Register a new model version with an updated Chalk image
new_version = client.register_model_version(
    name="my-model",
    input_schema={"x": pa.float64()},
    output_schema={"y": pa.float64()},
    model_image=(
        Image.debian_slim("3.11")
        .pip_install(["chalk-remote-call-python", "pyarrow"])
        .add_local_file("model_v2.py", "/app/model.py", strategy="copy")
        .env({"PYTHONPATH": "/app"})
        .workdir("/app")
        .entrypoint(
            [
                "chalk-remote-call",
                "--handler",
                "model.handler",
                "--port",
                "8080",
            ]
        )
    ),
)

# Update the scaling group with the new version
client.deploy_model_version_to_scaling_group(
    name="my-model-sg",
    model_name="my-model",
    model_version=new_version.model_version,
    handler="model.handler",
)

For more information on listing, inspecting, and deleting scaling groups, see the Scaling Groups page.

Structuring Your Model Deployment Code

Model registration and deployment should be controlled manually and separately from your feature definitions. Either:

  1. Add to .chalkignore to prevent them from running during chalk apply.
  2. Run in a separate repository dedicated to model management, keeping it independent from your Chalk feature code.

Your chalk apply will fail if it tries to run model registration and deployment code.

Organize your project to keep model management separate from feature definitions:

my-chalk-project/
|- models/                          # Model deployment code (add to .chalkignore)
|  |- model.py
|  `- deploy_model.py               # Registration + deployment script
|
|- features/                        # Feature definitions (synced with chalk apply)
|  |- __init__.py
|  `- user_features.py
|
|- .chalkignore
`- chalk.yaml

Put the following line in your .chalkignore so chalk apply skips everything under models/.

models/

Chalk Image Requirements

When using a chalkcompute.Image, Chalk builds and manages the container for you. Your image definition should:

  1. Install chalk-remote-call-python: Provides the server and request handling
  2. Add your handler file: Contains the inference logic Chalk will invoke
  3. Set chalk-remote-call as the entrypoint: Routes requests to your handler on a specified port

Example: NER Model

Here’s a complete example using spaCy for named entity recognition:

Image definition:

from chalkcompute import Image

image = (
    Image.debian_slim("3.11")
    .pip_install(["chalk-remote-call-python", "spacy"])
    .run_commands(["python -m spacy download en_core_web_sm"])
    .add_local_file("model.py", "/app/model.py", strategy="copy")
    .env({"PYTHONPATH": "/app"})
    .workdir("/app")
    .entrypoint(
        [
            "chalk-remote-call",
            "--handler",
            "model.handler",
            "--port",
            "8080",
        ]
    )
)

model.py:

import json
import pyarrow as pa
import spacy

nlp = None


def on_startup():
    global nlp
    nlp = spacy.load("en_core_web_sm")


def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
    texts = event["text"].to_pylist()
    results = []

    for text, doc in zip(texts, nlp.pipe(texts, batch_size=32)):
        if text is None:
            results.append(None)
            continue

        entities = [
            {
                "text": ent.text,
                "label": ent.label_,
                "start": ent.start_char,
                "end": ent.end_char,
            }
            for ent in doc.ents
        ]

        results.append(json.dumps({"text": text, "entities": entities}))

    return pa.array(results, type=pa.utf8())

Key chalkcompute.Image Methods

  • .base(image): Use a custom base Docker image
  • .debian_slim(python_version): Base image with a slim Debian OS and the specified Python version
  • .pip_install(packages): Install Python packages
  • .run_commands(commands): Run arbitrary shell commands during the build
  • .add_local_file(src, dest, strategy): Copy a local file into the image
  • .add_local_dir(src, dest, strategy): Copy a local directory into the image
  • .env(vars): Set environment variables
  • .workdir(path): Set the working directory
  • .entrypoint(command): Set the container entrypoint

Docker Image Requirements

Model deployments use the chalk-remote-call-python shim to handle request routing and PyArrow serialization. Your Docker image should:

  1. Install chalk-remote-call-python: Provides the server and request handling
  2. Define a handler function: Receives PyArrow Arrays, returns PyArrow Arrays
  3. Optionally define on_startup: Initialize resources like loading models
  4. Use chalk-remote-call as entrypoint: Runs your handler on a specified port

Example Dockerfile

FROM python:3.11-slim

WORKDIR /app

RUN pip install --no-cache-dir chalk-remote-call-python spacy
RUN python -m spacy download en_core_web_sm

COPY model.py /app/model.py

ENV PYTHONPATH=/app

EXPOSE 8080

ENTRYPOINT ["chalk-remote-call", "--handler", "model.handler", "--port", "8080"]

Build and push to a registry:

docker build --platform linux/amd64 -t my-model:latest .
docker push my-model:latest

Benefits of Model Deployments

  • Resource isolation: Models don’t compete with the engine for compute
  • Independent scaling: Scale models up or down based on their specific load
  • Easy updates: Deploy new model versions without downtime
  • Language agnostic: Run models in any language/framework as Docker containers
  • Observability: Monitor each model’s performance and resource usage separately