Features
Define one-to-many and many-to-many relationships between feature classes.
Has-many relationships link a feature to many instances of another feature.
The recommended way to specify a join for a has-many relationship is implicitly. In the
example below, a User
is linked to potentially multiple Transfer
s.
from chalk.features import features, DataFrame
@features
class Transfer:
id: str
# note, the annotation must be a string reference because User is
# defined after Transfer.
user_id: "User.id"
amount: float
@features
class User:
id: str
transfers: DataFrame[Transfer]
The following example, which explicitly sets the join, is equivalent to the above:
from chalk.features import has_many, DataFrame
@features
class Transfer:
id: str
user_id: str
amount: float
@features
class User:
id: str
transfers: DataFrame[Transfer] = has_many(lambda: Transfer.user_id == User.id)
Having established a has-many relationship, you can now reference the transfers for
a user through the user namespace. The has_many
feature returns a chalk.DataFrame,
which supports many helpful aggregation operations:
# Number of transfers made by a user
User.transfers.count()
# Total amount of transfers made by the user
User.transfers[Transfer.amount].sum()
# Total amount of the transfers made by the user that were returned
User.transfers[
Transfer.status == "returned",
Transfer.amount
].sum()
In the reverse direction, a one-to-many relation is defined by a has_one
relation (following the above example, a user has many transfers but a transfer has a
single user). However, you don’t have to explicitly set the join a second time. Instead,
the join condition is assumed to be symmetric and copied over. To complete the one-to-many
relationship from our example, add a User
to the Transfer
class:
@features
class Transfer:
...
user_id: str
amount: float
user: "User"
@features
class User:
...
uid: Transfer.user_id
transfers: DataFrame[Transfer]
Here, you need to use quotes around `User` to use a forward reference.
The recommended way to define a many-to-many relationship is through a joining feature class.
For instance, to define a many-many relationship between Actors
and Movies
, you
could write the following feature classes:
from chalk.features import features, DataFrame
@features
class Actor:
id: int
appearances: "DataFrame[MovieRole]"
full_name: str
# this will be used to demonstrate one of the ways the joining feature can be populated
movie_ids: list[int]
@features
class Movie:
id: int
title: str
@features
class MovieRole:
id: str
actor_id: Actor.id
movie_id: Movie.id
movie: Movie
Here you need to use quotes around `DataFrame[MovieRole]` to use a forward reference.
This joining feature class can be populated by a SQL file resolver:
-- resolves: MovieRole
-- source: PG
SELECT id, actor_id, movie_id FROM movie_roles;
Alternatively, by a DataFrame
-returning Python resolver (namespaced to one of the joined feature sets):
@online
def get_actor_in_movie(
a_id: Actor.id,
movie_ids: Actor.movie_ids,
) -> Actor.appearances:
return DataFrame([
MovieRole(
id=f"{a_id}_{m_id}",
actor_id=a_id,
movie_id=m_id
)
for m_id in movie_ids
])
The joining feature class lets you:
Actor
namespace, andActor
resolvers.For example, to get the titles for all the movies that an actor has appeared in, you can run the following query:
$ chalk query --in actor.id=1 --out actor.appearances.movie.title
Results
Name Hit? Value
───────────────────────────────────────────────────────────────────────────────
actor.appearances.movie.title ["The Bad Sleep Well","High and Low",...]