Compute
Architecture, isolation model, and security controls for Chalk compute sandboxes.
Chalk sandboxes are lightweight, isolated execution environments for running arbitrary code — agent workloads, model inference, data pipelines, or any container-based task. Each sandbox runs inside a gVisor-hardened container with its own filesystem, network namespace, and resource limits.
Sandboxes are designed around three principles:
Every sandbox runs under gVisor, a container runtime that intercepts application system calls through a user-space kernel. Unlike traditional containers that share the host kernel directly, gVisor interposes a second layer of defense:
┌────────────────────────┐
│ Application process │
├────────────────────────┤
│ gVisor (Sentry) │ ← intercepts syscalls
├────────────────────────┤
│ Host kernel │
└────────────────────────┘
This means a kernel exploit in one sandbox cannot compromise the host or other sandboxes. gVisor also restricts the set of available syscalls, reducing the attack surface exposed to untrusted code — particularly important for agent workloads that execute LLM-generated commands.
Each sandbox additionally receives its own:
Chalk operates a fleet of bare-metal nodes optimized for sandbox workloads. When you create a sandbox without additional configuration, it runs on this managed infrastructure:
from chalkcompute import Container, Image
c = Container(image=Image.debian_slim()).run()Managed serverless handles provisioning, scaling, and node maintenance. Sandboxes are scheduled across availability zones and can cold-start in under two seconds for common base images.
For workloads that must run within your cloud account — for compliance, data residency, or proximity to other infrastructure — Chalk can deploy sandboxes into your existing EKS or GKE clusters.
In this model, you install the Chalk node agent as a DaemonSet. The agent manages gVisor runtime configuration, volume mounts, and GPU device plugin integration. The control plane remains managed by Chalk; your cluster provides the compute.
# Install the Chalk node agent into your cluster
chalk compute install --cluster arn:aws:eks:us-east-1:123456789:cluster/my-clusterThe same Container and SandboxClient APIs work regardless of where the sandbox
is scheduled — your application code doesn’t change between managed and self-hosted.
Specify resource requests when creating a sandbox:
c = Container(
image=Image.debian_slim(),
cpu="4",
memory="16Gi",
)Resources are guaranteed (requests equal limits), so sandboxes are not subject to noisy-neighbor throttling.
GPU-accelerated workloads request a GPU type at creation time:
c = Container(
image="nvcr.io/nvidia/pytorch:24.01-py3",
gpu="A100",
cpu="8",
memory="64Gi",
)The Chalk scheduler matches the request to a node with the appropriate GPU hardware and configures the NVIDIA device plugin and driver mounts automatically. GPU workloads run under the same gVisor isolation as CPU workloads.
Chalk enforces tenant isolation at every layer of the stack:
| Layer | Mechanism |
|---|---|
| Runtime | gVisor kernel-level syscall interception per sandbox |
| Network | Separate network namespace per sandbox; no shared listening sockets |
| Storage | Volumes are scoped to the owning environment; cross-tenant access is impossible |
| Scheduling | Workloads from different tenants are placed on separate host nodes by default |
| Identity | Each sandbox receives a unique workload identity — no shared credentials |
For deployments with strict regulatory requirements, dedicated node pools can be configured so that a tenant’s workloads never share physical hardware with any other tenant.
Sandboxes follow a straightforward lifecycle:
Image spec (e.g. Image.debian_slim().pip_install([...])), it is built and cached. Pre-built OCI images are used directly.exec calls. Volumes are mounted and accessible.from chalkcompute import Container, Image
c = Container(
image=Image.debian_slim().pip_install(["numpy"]),
cpu="2",
memory="4Gi",
lifetime="1h",
).run()
result = c.exec("python", "-c", "import numpy; print(numpy.__version__)")
print(result.stdout_text)
c.stop()The following controls govern identity, network access, and credentials for sandbox workloads. They apply equally to bare sandboxes and to higher-level abstractions like Containers and Scaling Groups, which share the same isolation primitives.
Every sandbox and container launched through chalkcompute runs with a unique cloud identity
and a corresponding Chalk identity. These identities are scoped to the individual workload —
no two sandboxes share credentials, and workloads never run with delegate credentials from
the calling user.
Workload identities are issued automatically at creation time:
from chalkcompute import SandboxClient, Image
client = SandboxClient()
sandbox = client.create(image=Image.debian_slim())
# The sandbox is running with its own identity —
# it can authenticate to Chalk APIs without additional configuration.
result = sandbox.exec("chalk", "query", "--in", "user.id=1", "--out", "user.score")Chalk workload identities are OIDC-compliant. If your organization runs services that accept federated tokens (e.g. an internal model registry or a secrets manager), you can configure them to trust the Chalk identity provider directly. This lets sandboxes authenticate to your infrastructure without static credentials:
sub claim to an appropriate role or policy.No secrets need to be injected into the sandbox environment.
The MCP Gateway lets sandboxes interact with Model Context Protocol servers exposed at your enterprise — without giving the sandbox direct access to the underlying credentials.
When a sandbox calls the MCP Gateway, it authenticates using its workload identity (see above). The gateway validates the identity, then proxies the request to the upstream MCP server using credentials managed by your organization. The sandbox never sees the real credential.
┌─────────────┐ WIF token ┌─────────────┐ real credential ┌─────────────┐
│ Sandbox │ ──────────────────────▸ │ MCP Gateway │ ──────────────────────▸ │ MCP Server │
└─────────────┘ └─────────────┘ └─────────────┘
This is particularly useful for agent workloads. A code-generation agent may need to call tool-use APIs, search indexes, or retrieval services. With the gateway:
By default, sandboxes have unrestricted egress. For production workloads — especially autonomous agents — you should restrict outbound traffic to a known set of hosts.
Define a NetworkPolicy and bind it to a sandbox:
from chalkcompute import SandboxClient, Image, NetworkPolicy
policy = NetworkPolicy(
name="ai-agent-production",
allow_all=False,
allowed_hostnames=[
# LLM APIs
"api.openai.com",
"*.anthropic.com",
# Code repositories
"github.com",
"*.github.com",
# Package registries
"*.npmjs.org",
"pypi.org",
],
description="Production policy for AI coding agents",
)
client = SandboxClient()
sandbox = client.create(
image=Image.debian_slim(),
network_policies=[policy],
)Hostnames support leading wildcards (*.example.com). You can also specify raw IP
ranges in CIDR notation. Requests to any destination not on the allowlist are dropped
at the network layer — the sandbox receives a connection timeout rather than a
policy-violation error, preventing information leakage about the policy itself.
For workloads that need to communicate with each other or with on-premise infrastructure,
chalkcompute supports WireGuard-based IPv4 tunnels.
Tunnel keys are generated dynamically per session and negotiated through the Chalk metadata plane — you do not need to manage static keys or pre-shared secrets. Each tunnel endpoint is scoped to a single workload and torn down when the workload terminates.
from chalkcompute import SandboxClient, Image, Tunnel
client = SandboxClient()
# Create two sandboxes that can reach each other
tunnel = Tunnel(name="worker-mesh")
sandbox_a = client.create(
image=Image.debian_slim(),
tunnels=[tunnel],
)
sandbox_b = client.create(
image=Image.debian_slim(),
tunnels=[tunnel],
)
# sandbox_a and sandbox_b can now communicate over
# their tunnel addresses without traversing the public internet.Tunnels can also bridge to external WireGuard peers, enabling secure connectivity to on-premise databases or private APIs without exposing them to the broader internet.