Define one-to-many and many-to-many relationships between feature classes.
Has-many relationships link a feature to many instances of another feature.
The recommended way to specify a join for a has-many relationship is implicitly. In the
example below, a User
is linked to potentially multiple Transfer
from chalk.features import features, DataFrame
class Transfer:
id: str
# note, the annotation must be a string reference because User is
# defined after Transfer.
user_id: "User.id"
amount: float
class User:
id: str
transfers: DataFrame[Transfer]
The following example, which explicitly sets the join, is equivalent to the above:
from chalk.features import has_many, DataFrame
class Transfer:
id: str
user_id: str
amount: float
class User:
id: str
transfers: DataFrame[Transfer] = has_many(lambda: Transfer.user_id == User.id)
You can also specify multiple join keys for a has-many relationship. For example, say hospitals wants to compute aggregations over visit to each of their departments. We could write the following feature classes.
class HospitalVisit:
id: str = _.user_id + _.hospital + _.department + _.date
composite_key: str = _.hospital + "-" + _.department
department: str
hospital: str
user_id: str
date: datetime
class HospitalDepartment:
id: int
name: str
hospital_name: str
composite_key_match: str = _.hospital_name + "-" + _.name
# multi-feature join
visits: DataFrame[HospitalVisit] = has_many(lambda: (HospitalDepartment.hospital_name == HospitalVisit.hospital) & (HospitalDepartment.name == HospitalVisit.department))
# composite key join
visits_with_composite_key: DataFrame[HospitalVisit] = has_many(lambda: HospitalDepartment.composite_key_match == HospitalVisit.composite_key)
You can also join in Has-Many relationships for features that have primary composite keys.
class SoftwareEngineer:
id: int = _.first_name + " " + _.last_name
first_name: str
last_name: str
manager_id: str
class Manager:
id: int
direct_reports: DataFrame[SoftwareEngineer] = has_many(lambda: Manager.id == SoftwareEngineer.manager_id)
Having established a has-many relationship, you can now reference the transfers for
a user through the user namespace. The has_many
feature returns a chalk.DataFrame,
which supports many helpful aggregation operations:
# Number of transfers made by a user
# Total amount of transfers made by the user
# Total amount of the transfers made by the user that were returned
Transfer.status == "returned",
In the reverse direction, a one-to-many relation is defined by a has_one
relation (following the above example, a user has many transfers but a transfer has a
single user). However, you don’t have to explicitly set the join a second time. Instead,
the join condition is assumed to be symmetric and copied over. To complete the one-to-many
relationship from our example, add a User
to the Transfer
class Transfer:
user_id: str
amount: float
user: "User"
class User:
uid: Transfer.user_id
transfers: DataFrame[Transfer]
Here, you need to use quotes around `User` to use a forward reference.
The recommended way to define a many-to-many relationship is through a joining feature class.
For instance, to define a many-many relationship between Actors
and Movies
, you
could write the following feature classes:
from chalk.features import features, DataFrame
class Actor:
id: int
appearances: "DataFrame[MovieRole]"
full_name: str
# this will be used to demonstrate one of the ways the joining feature can be populated
movie_ids: list[int]
class Movie:
id: int
title: str
class MovieRole:
id: str
actor_id: Actor.id
movie_id: Movie.id
movie: Movie
Here you need to use quotes around `DataFrame[MovieRole]` to use a forward reference.
This joining feature class can be populated by a SQL file resolver:
-- resolves: MovieRole
-- source: PG
SELECT id, actor_id, movie_id FROM movie_roles;
Alternatively, by a DataFrame
-returning Python resolver (namespaced to one of the joined feature sets):
def get_actor_in_movie(
a_id: Actor.id,
movie_ids: Actor.movie_ids,
) -> Actor.appearances:
return DataFrame([
for m_id in movie_ids
The joining feature class lets you:
namespace, andActor
resolvers.For example, to get the titles for all the movies that an actor has appeared in, you can run the following query:
$ chalk query --in actor.id=1 --out actor.appearances.movie.title
Name Hit? Value
actor.appearances.movie.title ["The Bad Sleep Well","High and Low",...]