Features
Define one-to-many and many-to-many relationships between feature sets.
Has-many relationships link a feature to many instances of another feature.
The recommended way to specify a join for a has-many relationship is implicitly. In the
example below, a User
is linked to potentially multiple Transfer
s.
from chalk.features import features, DataFrame, ...
@features
class Transfer:
id: str
# note, the annotation must be a string reference because User is
# defined after Transfer.
user_id: "User.id"
amount: float
@features
class User:
id: str
transfers: DataFrame[Transfer]
The following example, which explicitly sets the join, is equivalent to the above:
from chalk.features import has_many, DataFrame
@features
class Transfer:
id: str
user_id: str
amount: float
@features
class User:
id: str
transfers: DataFrame[Transfer] = has_many(lambda: Transfer.user_id == User.id)
Having established a has-many relationship, you can now reference the transfers for
a user through the user namespace. The has_many
feature returns a chalk.DataFrame,
which supports many helpful aggregation operations:
# Number of transfers made by a user
User.transfers.count()
# Total amount of transfers made by the user
User.transfers[Transfer.amount].sum()
# Total amount of the transfers made by the user that were returned
User.transfers[
Transfer.status == "returned",
Transfer.amount
].sum()
In the reverse direction, a one-to-many relation is defined by a has_one
relation (following the above example, a user has many transfers but a transfer has a
single user). However, you don’t have to explicitly set the join a second time. Instead,
the join condition is assumed to be symmetric and copied over. To complete the one-to-many
relationship from our example, add a User
to the Transfer
class:
@features
class Transfer:
...
user_id: str
amount: float
user: "User"
@features
class User:
...
uid: Transfer.user_id
transfers: DataFrame[Transfer]
Here, you need to use quotes around `User` to use a forward reference.
Many-to-many relations are defined by has-many relationship on both sides of the relation. As before, you don’t need to specify the join condition a second time if the join condition is symmetric.
@features
class Book:
...
author_id: str
uuid: str
authors: "DataFrame[Author]"
@features
class Author:
...
uuid: Book.author_id
books: DataFrame[Book]
Here you need to use quotes around `DataFrame[Author]` to use a forward reference.