Features
Using underscore expressions to define features
Underscore expressions can be used to define and resolve features from operations on other features.
Consider the feature class RocketEngine
defined below:
from chalk.features import features
@features
class RocketEngine:
id: int
mass: float
volume: float
Suppose we wanted to define a feature called density
and resolve it as mass / volume
.
We can use the underscore feature definition syntax as follows:
from chalk.features import features
from chalk.features import _, features
@features
class RocketEngine:
id: int
mass: float
volume: float
# Defining density using an underscore expression
density: float = _.mass / _.volume
Note that _
is a reference to the containing feature class RocketEngine
and should
be used to reference features contained in the same feature class.
Underscore can also be used to define expressions with DataFrame
features.
In the example below, we will define two new feature classes, Astronaut
and RocketShip
:
from chalk.features import features, has_one, has_many, DataFrame
@features
class Astronaut:
id: int
name: str
height: float
mass: float
ship_id: int
@features
class RocketShip:
id: int
engine_id: int
# `has_one` relationship with `RocketEngine`
engine: RocketEngine = has_one(lambda: RocketShip.engine_id == RocketEngine.id)
# `has_many` relationship with `Astronaut`
astronauts: DataFrame[Astronaut] = has_many(lambda: Astronaut.ship_id == RocketShip.id)
Suppose we wanted to define a feature called total_mass
on RocketShip
and resolve
it as the sum of the masses of the astronauts on the ship and the mass of the engine.
Using the underscore syntax, we can define total_mass
as follows:
from chalk.features import features, has_one, has_many, DataFrame
from chalk.features import _, features, has_one, has_many, DataFrame
@features
class Astronaut:
id: int
name: str
height: float
mass: float
ship_id: int
@features
class RocketShip:
id: int
engine_id: int
# `has_one` relationship with `RocketEngine`
engine: RocketEngine = has_one(lambda: RocketShip.engine_id == RocketEngine.id)
# `has_many` relationship with `Astronaut`
astronauts: DataFrame[Astronaut] = has_many(lambda: Astronaut.ship_id == RocketShip.id)
# Defining total_mass using an underscore expression
total_mass: float = _.engine.mass + _.astronauts[_.mass].sum()
Here we used the underscore syntax do a projection of the mass
column in the
astronauts DataFrame
.
Underscores can also be used to define filters on DataFrame
features.
For example, suppose we wanted to define a feature called count_tall_astronauts
on RocketShip
.
We could define the feature as follows
@features
class RocketShip:
id: int
engine_id: int
# `has_one` relationship with `RocketEngine`
engine: RocketEngine = has_one(lambda: RocketShip.engine_id == RocketEngine.id)
# `has_many` relationship with `Astronaut`
astronauts: DataFrame[Astronaut] = has_many(lambda: Astronaut.ship_id == RocketShip.id)
# Defining total_mass using an underscore expression
total_mass: float = _.engine.mass + _.astronauts[_.mass].sum()
# Filtering on the `height` column
count_tall_astronauts = _.astronauts[_.height > 2.0].count()