Chalk home page
  1. Features
  2. Feature Types


Features can be any primitive Python type:

from enum import Enum

class Genre(Enum):

class Book:
    id: int
    name: str
    publish_date: date
    copyright_ended_at: datetime | None
    genre: Genre


Features can also be lists and sets of any primitive, including dataclasses, Pydantic models, and attrs classes.

class Book:
    authors: list[str]
    categories: set[str]


You can use any dataclass as a struct feature. Struct types should be used for objects that don’t have ids. If an object has an id, consider using has-one.

class JacketInfo:
    title: str
    subtitle: str
    body: str

class Book:
    id: int
    jacket_info: JacketInfo

Pydantic models

If you prefer pydantic to dataclass, you can use that instead.

from pydantic import BaseModel, constr

class TitleInfo(BaseModel):
    heading: constr(min_length=2)
    subheading: Optional[str]

class Book:
    title: TitleInfo


Both dataclass and pydantic structs are implemented using the pyarrow serialization format, a high-performance schema for data serialization.
This data is stored “value only”, i.e. without keys, so any change to these structs over time will invalidate historical data. To support feature values where the schema changes over time, we introduced the Document struct type.
Documents are serialized as JSON and supports changes to schema over time, at the cost of a small performance penalty.

from pydantic import BaseModel
from chalk import Document

class AuthorInfo(BaseModel):
    first_name: str
    last_name: str

class Book:
    title: Document[AuthorInfo]


Alternatively, you can use attrs. Any of these struct types (dataclass, pydantic, and attrs) can be used with collections like set[...] or list[...].

import attrs

class TableOfContentsItem:
    foo: str
    bar: int

class Book:
    table_of_contents: list[TableOfContentsItem]

Custom serializers

Finally, if you have an object that you want to serialize that isn’t from dataclass, attrs, or pydantic, you can write a custom codec.

Consider the custom class below:

class CustomStruct:
    def __init__(self, foo: str, bar: int) -> None: = foo = bar

    def __eq__(self, other: object) -> bool:
        return (
            isinstance(other, CustomStruct)
            and ==
            and ==

    def __hash__(self) -> int:
        return hash((,

Here, we use the custom class as a feature, and provide an encoder and decoder. The encoder takes an instance of the custom type and outputs a Python object, and the decoder takes output of the encoder and creates an instance of the custom type

class Book:
    custom_field: CustomStruct = feature(
        encoder=lambda x: dict(,,
        decoder=lambda x: CustomStruct(**x),