Define features for training and inference.
Features can be any primitive Python type:
from enum import Enum class Genre(Enum): FICTION = "FICTION" NONFICTION = "NONFICTION" DRAMA = "DRAMA" POETRY = "POETRY" @features class Book: id: int name: str publish_date: date copyright_ended_at: datetime | None genre: Genre
Features can also be lists and sets of any primitive, including dataclasses, Pydantic models, and attrs classes.
@features class Book: authors: list[str] categories: set[str]
You can use any
dataclass as a struct feature.
Struct types should be used for objects that don’t have ids.
If an object has an id, consider using has-one.
@dataclass class JacketInfo: title: str subtitle: str body: str @features class Book: id: int jacket_info: JacketInfo
If you prefer
dataclass, you can use that instead.
from pydantic import BaseModel, constr class TitleInfo(BaseModel): heading: constr(min_length=2) subheading: Optional[str] @features class Book: title: TitleInfo ...
Both dataclass and pydantic
structs are implemented using the pyarrow
serialization format, a high-performance schema for data serialization.
This data is stored “value only”, i.e. without keys, so any change to these structs over time will invalidate historical data. To support feature values where the schema changes over time, we introduced the
Document struct type.
Documents are serialized as JSON and supports changes to schema over time, at the cost of a small performance penalty.
from pydantic import BaseModel from chalk import Document class AuthorInfo(BaseModel): first_name: str last_name: str @features class Book: title: Document[AuthorInfo] ...
Alternatively, you can use
attrs. Any of these struct types
attrs) can be used with
import attrs @attrs.define class TableOfContentsItem: foo: str bar: int @features class Book: table_of_contents: list[TableOfContentsItem] ...
Finally, if you have an object that you want to serialize that isn’t
pydantic, you can write a custom codec.
Consider the custom class below:
class CustomStruct: def __init__(self, foo: str, bar: int) -> None: self.foo = foo self.bar = bar def __eq__(self, other: object) -> bool: return ( isinstance(other, CustomStruct) and self.foo == other.bar and self.bar == other.bar ) def __hash__(self) -> int: return hash((self.foo, self.bar))
Here, we use the custom class as a feature, and provide an encoder and decoder. The encoder takes an instance of the custom type and outputs a Python object, and the decoder takes output of the encoder and creates an instance of the custom type
@features class Book: custom_field: CustomStruct = feature( encoder=lambda x: dict(foo=x.foo, bar=x.bar), decoder=lambda x: CustomStruct(**x), )