Feature Development Lifecycle

Chalk is the fastest way to deploy new features and feature pipelines to production. This demo covers creating a new feature. You’ll develop, unit test, integration test, and deploy a resolver for this feature. Then you’ll see how to backfill that feature to Chalk’s offline store so that data scientists can use it in historical training sets.

Develop

The first step is to write a new feature and resolver for that feature. Imagine you wanted to create a new feature called user.name_email_match_score. This feature should capture the similarity between a user’s name and email. Put another way: andy and andy@gmail.com should produce a high score, and emily and andy@gmail.com should produce a low score.

The first step is to add the new feature, and a resolver for that feature.

example.py

@features
class User:
  ...
  fullname: str
  email: str
  name_email_match_score: float

@online
def name_email_match_scorer(
    email: User.email,
    fullname: User.fullname
) -> User.name_email_match_score:
    email = email.split("@")[0]
    intersection = set(fullname) & set(email)
    union = set(fullname) | set(email)
    jaccard_similarity = len(intersection) / len(union)
    return jaccard_similarity

Our new resolver name_email_match_scorer takes a dependency on the user’s email and fullname in the argument list. Then, it declares that it returns the User.name_email_match_score in the return type signature.

In the body of the function, we compute the Jaccard index between the email without the domain and the full name.

Test

Now that you’ve written the new feature and resolver, it’s time to validate your change. Chalk supports both unit testing and integration testing of your feature pipelines.

Unit test

You can unit test resolvers just like normal Python functions using any unit testing framework:

from pytest import approx

def test_name_email_match_scorer():
    assert approx(0.60) == name_email_match_scorer(
        "katherine.johnson@nasa.gov",
        "Katherine Johnson",
    )
    assert approx(0.39) == name_email_match_scorer(
        "katherine.johnson@nasa.gov",
        "Eleanor Roosevelt",
    )

Here, the Jaccard index for katherine.johnson@nasa.gov is higher with the name Katherine Johnson than with the name Eleanor Roosevelt, as expected.

Deploy branch

Now that the unit tests have passed, you can create a Branch Deploy with the new changes.

Chalk allows you to create an unlimited number of branch deployments. Branch deployments run all of your resolvers in the same way that they run in production. However, branch deployments don’t impact the offline store. You can create a branch deployment with the following command:

chalk apply --branch test

Integration test

With the branch deployment created, you can run integration tests on your changes.

integration_test.py

import pytest
from chalk.client import ChalkClient

@pytest.fixture(scope="module")
def chalk_client():
    client = ChalkClient(local=True)
    return client

def test_get_features(chalk_client):
    resp = chalk_client.query(
        input={User.id: 1},
        output=[
            User.credit_report.fico_score,
            User.email,
            User.name,
            User.dob,
        ],
    )
    assert resp.get_feature_value(User.credit_report.fico_score) == 700
    assert resp.get_feature_value(User.email) == "katherine.johnson@nasa.gov"

Here, the Chalk API client is configured to use the preview deployment id returned in the previous step.

Deploy

Once you’ve tested your changes, it’s time to deploy! This step looks much like the preview deployment, but this time without the --branch flag:

chalk apply

Now, your production environments can request the new user.name_email_match_score feature.

chalk query --in user.id=1 --out user.name_email_match_score

Backfill

The user.name_email_match_score feature is live! But historically, this feature did not exist, and you won’t be able to sample its values at previous times until you backfill the values, which you can do with the ‘chalk trigger’ command.

chalk trigger --resolver example.name_email_match_scorer --lower-bound 2020-05-05T12:00:00+00:00 --persist-offline=True

This command will backfill historical values for the user.name_email_match_score feature.

Offline training

With the feature backfilled, you can query the historical value:

localhost:3000

Chalk AI - Documentation Reference

Jupyter Notebook

Chalk AI - Alerts

That’s how you deploy a new feature with Chalk! Let a member of our technical team know if we can be helpful.

Delete

Mistakes are inevitable, but Chalk provides the tools you need to quickly and easily recover and keep going.

Drop a feature

If you need to change a feature’s type (for example from string to int), or if you want to drop all the data for a feature value, the Chalk CLI has you covered with chalk drop. Simply execute the command from the CLI and you’ll have a fresh start to recreate the feature and its data as necessary.

Delete a row

Sometimes you just need to fully remove a record from your systems, whether because of a GDPR mandated “right to forget” request or due to a business requirement. Chalk provides the chalk delete command to meet this need.

​Develop

​Test

​Unit test

​Deploy branch

​Integration test

​Deploy

​Backfill

​Offline training

​Delete

​Drop a feature

​Delete a row

On this page