Has Many - Chalk

Has-many relationships link a feature to many instances of another feature.

Foreign Keys

The recommended way to specify a join for a has-many relationship is implicitly. In the example below, a User is linked to potentially multiple Transfers.

from chalk.features import features, DataFrame

@features
class Transfer:
    id: str
    # note, the annotation must be a string reference because User is
    # defined after Transfer.
    user_id: "User.id"
    amount: float

@features
class User:
    id: str
    transfers: DataFrame[Transfer]

Explicit Join

The following example, which explicitly sets the join, is equivalent to the above:

from chalk.features import has_many, DataFrame

@features
class Transfer:
    id: str
    user_id: str
    amount: float

@features
class User:
    id: str
    transfers: DataFrame[Transfer] = has_many(lambda: Transfer.user_id == User.id)

Aggregations on References

Having established a has-many relationship, you can now reference the transfers for a user through the user namespace. The has_many feature returns a chalk.DataFrame, which supports many helpful aggregation operations:

# Number of transfers made by a user
User.transfers.count()

# Total amount of transfers made by the user
User.transfers[Transfer.amount].sum()

# Total amount of the transfers made by the user that were returned
User.transfers[
    Transfer.status == "returned",
    Transfer.amount
].sum()

Back-references

One-to-many

In the reverse direction, a one-to-many relation is defined by a has_one relation (following the above example, a user has many transfers but a transfer has a single user). However, you don’t have to explicitly set the join a second time. Instead, the join condition is assumed to be symmetric and copied over. To complete the one-to-many relationship from our example, add a User to the Transfer class:

@features
class Transfer:
  ...
  user_id: str
  amount: float
  user: "User"

@features
class User:
  ...
  uid: Transfer.user_id
  transfers: DataFrame[Transfer]

Here, you need to use quotes around `User` to use a forward reference.

Many-to-many

The recommended way to define a many-to-many relationship is through a joining feature class. For instance, to define a many-many relationship between Actors and Movies, you could write the following feature classes:

from chalk.features import features, DataFrame

@features
class Actor:
  id: int
  appearances: "DataFrame[MovieRole]"
  full_name: str

  # this will be used to demonstrate one of the ways the joining feature can be populated
  movie_ids: list[int]

@features
class Movie:
  id: int
  title: str

@features
class MovieRole:
  id: str
  actor_id: Actor.id
  movie_id: Movie.id
  movie: Movie

Here you need to use quotes around `DataFrame[MovieRole]` to use a forward reference.

This joining feature class can be populated by a SQL file resolver:

-- resolves: MovieRole
-- source: PG
SELECT id, actor_id, movie_id FROM movie_roles;

Alternatively, by a DataFrame-returning Python resolver (namespaced to one of the joined feature sets):

@online
def get_actor_in_movie(
  a_id: Actor.id,
  movie_ids: Actor.movie_ids,
) -> Actor.appearances:
  return DataFrame([
    MovieRole(
      id=f"{a_id}_{m_id}",
      actor_id=a_id,
      movie_id=m_id
    )
    for m_id in movie_ids
  ])

The joining feature class lets you:

query for movie features from the Actor namespace, and
use movie features in downstream Actor resolvers.

For example, to get the titles for all the movies that an actor has appeared in, you can run the following query:

$ chalk query --in actor.id=1 --out actor.appearances.movie.title
Results

 Name                           Hit?  Value
───────────────────────────────────────────────────────────────────────────────
 actor.appearances.movie.title        ["The Bad Sleep Well","High and Low",...]

​Foreign Keys

​Explicit Join

​Aggregations on References

​Back-references

​One-to-many