Overview

Volumes provide persistent file storage that can be shared across sandboxes and containers. Under the hood, volumes are backed by a Rust-based FUSE driver that mounts directly into the container’s filesystem. This is file-level storage, not block storage — workloads interact with volumes through normal filesystem operations, and the driver handles tiered caching and replication to object storage transparently.

Volumes can be managed through the Python SDK, the CLI, or the web dashboard.


Creating and using volumes

CLI

# Create a volume
chalk volume create my-data

# Upload files
chalk volume put my-data ./local-model.bin models/latest.bin

# List contents
chalk volume ls my-data

# Download a file
chalk volume get my-data models/latest.bin ./downloaded-model.bin

Python SDK

from chalkcompute import Volume

vol = Volume("my-data")

# Upload files
vol.put_file("models/latest.bin", model_bytes)

# Batch upload for efficiency
with vol.batch_upload() as batch:
    batch.put("data/train.csv", train_csv)
    batch.put("data/eval.csv", eval_csv)

# Read files
data = vol.read_file("models/latest.bin")

# List files
for f in vol.listdir("data/"):
    print(f"{f.path}  {f.size} bytes")

Mounting into sandboxes

Pass volumes when creating a sandbox to mount them into the container filesystem:

from chalkcompute import SandboxClient, Image, Volume

vol = Volume("my-data")

client = SandboxClient()
sandbox = client.create(
    image=Image.debian_slim(),
    volumes=[vol],
)

# The volume is mounted and accessible as a normal directory
sandbox.exec("ls", "/volumes/my-data")

Web browser

Volumes are also browsable from the Chalk dashboard. You can navigate the file tree, preview file contents, and upload or download files directly from the browser.


Architecture

FUSE driver

The volume mount is implemented as a Rust-based FUSE (Filesystem in Userspace) driver that runs inside the container. The driver exposes the volume as a standard POSIX directory, so workloads don’t need special libraries or APIs to read and write files — any tool that works with the filesystem works with a volume.

The FUSE driver handles:

  • Tiered caching. Frequently accessed files are cached on the container’s local disk. Cold data is fetched from object storage on demand.
  • Transparent replication. All persisted data is durably stored in object storage (S3 or GCS, depending on your environment). Local cache is ephemeral and rebuilt automatically.

Copy-on-write semantics

Volumes use batch copy-on-write. When a container writes to a mounted volume, the writes are buffered locally on the container’s filesystem. These local writes are not visible to other containers and are not persisted to the backing store until sync is called:

# Writes inside the container are local until sync
sandbox.exec("cp", "output.parquet", "/volumes/my-data/output.parquet")

# Flush writes to the backing store
vol.sync()

This design avoids partial-write visibility — other consumers of the volume see a consistent snapshot, not a stream of in-progress file mutations. If the container terminates before sync, uncommitted writes are discarded.


Versioning and fork semantics

Every sync creates an immutable snapshot of the volume’s state. Past versions are retained and can be retrieved by version ID:

# List available versions
versions = vol.versions()

# Open a previous version (read-only)
old = vol.at_version(versions[-2].id)
data = old.read_file("models/latest.bin")

Because prior versions are immutable and cheaply addressable, volumes support fork semantics. You can spawn multiple sandboxes from the same volume version, let them diverge independently, and sync their results into separate version lineages — without copying the underlying data.

This is particularly useful for coding agents that need to explore multiple solution paths in parallel:

base_version = vol.latest_version()

# Fork two sandboxes from the same starting state
sandbox_a = client.create(
    image=Image.debian_slim(),
    volumes=[vol.at_version(base_version.id)],
)
sandbox_b = client.create(
    image=Image.debian_slim(),
    volumes=[vol.at_version(base_version.id)],
)

# Each sandbox writes independently — no interference
sandbox_a.exec("python", "approach_a.py")
sandbox_b.exec("python", "approach_b.py")

Each sandbox’s writes are isolated until explicitly synced, and the original version remains available regardless of what the forks produce.