Notebook Development

Chalk makes it really easy to write and test out new features from a notebook.

To get started, you’ll want to install and connect Chalk to your notebook environment. Generally this involves two steps:

Installing the chalkpy[runtime] package,
Authenticating to Chalk (using the chalk cli or by setting environment variables)

We provide specific setup instructions for Deepnote, Hex and Vertex, but the process should be similar for other notebook environments.

Installing the Chalk Python Dependency

To install dependencies, run the !pip install "chalkpy[runtime]" bash command directly in a cell. Alternatively, you can add and install the chalkpy[runtime] dependency in your kernel’s virtual environment.

Connecting Your Notebook to Chalk

To authenticate, we recommend generating a client ID and client secret pair.

To do this, open the Chalk Dashboard. Go to Settings > Access Tokens and click on New Token, selecting all necessary permissions (generally, you’ll want: online query, offline query, and preview deployment permissions).

You can now use these access credentials to connect Chalk to any Jupyter notebook environment—we walk through this process for a few different notebook environments in our notebook setup docs.

notebook.ipynb

from chalk.client import ChalkClient
import os

CHALK_CLIENT_ID = os.environ.get("CHALK_CLIENT_ID")
CHALK_CLIENT_SECRET = os.environ.get("CHALK_CLIENT_SECRET")

client = ChalkClient(
  client_id=CHALK_CLIENT_ID,
  client_secret=CHALK_CLIENT_SECRET,
  branch='notebook',
)

Verifying the Setup

To verify that everything is properly set up, you can run the following code in a cell:

notebook.ipynb

client.whoami()

If everything is properly configured, you should see your user_id, team_id and environment_id come back:

WhoAmIResponse(user='<user_id>', environment_id=<environment_id>, team_id='<team_id>')

Loading Features

Now we can start building some new features!

To load your Chalk features into your notebook, you’ll want to run:

notebook.ipynb

from chalk.client import ChalkClient

client = ChalkClient()

client.load_features()

This will print out your available features and load them into your notebook:

Director
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Feature            ┃ Type                  ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ id                 │ str                   │
│ name               │ str                   │
│ birth_year         │ int                   │
│ active             │ bool                  │
│ favorite_numbers   │ list[int]             │
│ death_year         │ int | None            │
│ known_for_titles   │ list[str]             │
│ known_for_genres   │ list[str]             │
│ movies             │ DataFrame[Movie]      │
│ num_movies         │ int                   │
└────────────────────┴───────────────────────┘
Movie
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Feature                ┃ Type              ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ id                     │ str               │
│ director_id            │ str               │
│ release_number         │ int               │
│ type_                  │ str               │
│ title                  │ str               │
│ original_title         │ str               │
│ release_date           │ dt.datetime       │
│ runtime_m              │ float             │
│ primary_genre          │ str               │
│ secondary_genre        │ str               │
└────────────────────────┴───────────────────┘

We’ll use the data schema above for the rest of this tutorial.

Before we get started writing features, you’ll want to import _ from Chalk _ (underscore) allows you to build out features on your feature classes.

notebook.ipynb

from chalk.features import _

Additionally, if you want to create and work from a new branch (which is recommended), run the following command:

notebook.ipynb

client.get_or_create_branch(branch_name="<your-branch-name>")

This command will:

create a new Chalk branch deployment that has the same features defined as your main chalk deployment,
point your client to that new branch.

If the branch already exists, your client will just be updated to point to that branch (no new branch deployment will be created).

Building New Features

Lets say we want to write a new “exclaimed name” feature, which is a director’s name with an exclamation mark tacked on. To do this, run the following code in a cell:

notebook.ipynb

Director.name_exclaimed = _.name + "!"

In this example, the underscore acts as a reference to the Director feature class`.

We have now created a new feature in our notebook and can query for it in either an online or offline context.

Querying for features

To query for your new features, pass them as outputs to the ChalkClient query methods. For instance, to run an online query for your new Director.name_exclaimed feature, you can run the following:

notebook.ipynb

new_features = client.query(
    input={
        Director.id: 113,
    },
    output={
        Director.name,
        Director.name_exclaimed,
    }
)

new_features

Running an offline query is equally straight forward:

notebook.ipynb

chalk_dataset = client.offline_query(
    input={
        Director.id: list(range(1000)),
    },
    output={
        Director.id,
        Director.name,
        Director.name_exclaimed,
    },
    recompute_features=True,
    run_asynchronously=True,
    dataset_name="<dataset_name>",
)

df = chalk_dataset.to_pandas()

Running the offline query above creates a Pandas DataFrame with the columns Director.id, Director.name, and Director.name_exclaimed (alongside some metadata columns). This DataFrame will have one row for every Directors with an id between 0 and 999.

By providing a dataset_name to the offline_query, we’ve created a named dataset. Named datasets can easily be shared with other team members.

Any historical dataset can be accessed using the ChalkClient. Named datasets can be accessed as follows:

notebook.ipynb

client = ChalkClient()

dataset = client.get_dataset(name="<datset_name>")

Building Out More Complex Features

You can use expressions to build out more complex features. Generally these more complex features are composed of: has-one relationships, has-many relationships, and chalk functions.

Using Has-one Relationships

In the schema we’ve defined above, a Movie is in a has-one relationship with its Director (even if this is not necessarily true in practice).

When building new features on the Movie class, features from joined classes can be directly accessed. For instance, lets say we want to define Movie.is_latest_release. We can accomplish this by defining a feature that reaches through the has_one join:

notebook.ipynb

Movie.is_latest_release= _.director.num_movies == _.release_number

The expression above defines a boolean feature which tests whether the release number for a movie (_.release_number) is equal to the number of movies a director has released (_.director.num_movies).

Using Has-Many Relationships

In the schema we’ve defined above, a Director is in a has-many relationship with their Movies.

When building new features on the Director class, features from joined classes can be directly accessed. For instance, lets say we want to define Director.average_movie_runtime. We can accomplish this by defining an aggregation feature:

notebook.ipynb

Director.average_movie_runtime = _.movies[_.runtime].mean()

In the above example, the two different _’s refer to two different feature class namespaces:

_.movies can be thought of as Director.movies,
_.runtime can be thought of as Movie.runtime.

The expression above gets a directors movies (_.movies), specifies that the aggregation is happening on the runtime field of those movies (_.movies[_.runtime]) and then calculates the mean (.mean()).

Filters can also be passed to these expressions. For instance, say we want to filter out any movie that has a primary or secondary genre of short from the average runtime. This can be done as follows:

notebook.ipynb

Director.average_movie_runtime_filtered = _.movies[
    _.runtime,                      # aggregation column
    _.primary_genre != "short",     # filter
    _.secondary_genre != "short"    # additional filter
].mean()

Using Chalk Functions

Chalk exposes a number of functions that can be called and composed as part of expressions—a full list of these functions can be found in our expressions docs.

Below, we provide a few examples of how these functions can be used to create new features:

Identify whether a movie belongs to a Director’s typical genres:

notebook.ipynb

Movie.is_directors_usual_genre = F.contains(_.director.genres, _.primary_genre)

Identify whether a movie was released on a US federal holiday:

notebook.ipynb

Movie.released_on_holiday = F.is_us_federal_holiday(_.release_daate)

Get release_date of Director’s last movie:

notebook.ipynb

Director.latest_movie_release = F.max_by(_.movies[_.release_date], _.release_date)

Composing New Features

The features you’ve defined in your notebook can also be composed—they can be used to define new features. For instance, you can define the Director.makes_movies_that_are_too_long feature:

notebook.ipynb

Director.makes_movies_that_are_too_long = _.average_movie_runtime > 150

Translating this Code into Production Code

Defining features as these expressions provides a couple really significant advantages:

You build out really modular and granular features that can be composed.
Your features translate directly into performant code—everything we’ve written in this tutorial can be directly translated to your Chalk feature store and run efficiently.

​Installing the Chalk Python Dependency

​Connecting Your Notebook to Chalk

​Verifying the Setup

​Loading Features

​Building New Features

​Querying for features

​Building Out More Complex Features

​Using Has-one Relationships

​Using Has-Many Relationships

​Using Chalk Functions

​Composing New Features

​Translating this Code into Production Code

On this page

Installing the Chalk Python Dependency

Connecting Your Notebook to Chalk

Verifying the Setup

Loading Features

Building New Features

Querying for features

Building Out More Complex Features

Using Has-one Relationships

Using Has-Many Relationships

Using Chalk Functions

Composing New Features

Translating this Code into Production Code