Getting Started
Developing, testing, deploying, and backfilling a new feature
Chalk is the fastest way to deploy new features and feature pipelines to production. This demo covers creating a new feature. You’ll develop, unit test, integration test, and deploy a resolver for this feature. Then you’ll see how to backfill that feature to Chalk’s offline store so that data scientists can use it in historical training sets.
The first step is to write a new feature and resolver for that feature.
Imagine you wanted to create a new feature called user.name_email_match_score
.
This feature should capture the similarity between a user’s name and email.
Put another way:
andy
and andy@gmail.com
should produce a high score, and
emily
and andy@gmail.com
should produce a low score.
The first step is to add the new feature, and a resolver for that feature.
@features
class User:
...
fullname: str
email: str
name_email_match_score: float
@online
def name_email_match_scorer(
email: User.email,
fullname: User.fullname
) -> User.name_email_match_score:
email = email.split("@")[0]
intersection = set(fullname) & set(email)
union = set(fullname) | set(email)
jaccard_similarity = len(intersection) / len(union)
return jaccard_similarity
Our new resolver name_email_match_scorer
takes a dependency on
the user’s email and fullname in the argument list.
Then, it declares that it returns the User.name_email_match_score
in the return type signature.
In the body of the function, we compute the Jaccard index between the email without the domain and the full name.
Now that you’ve written the new feature and resolver, it’s time to validate your change. Chalk supports both unit testing and integration testing of your feature pipelines.
You can unit test resolvers just like normal Python functions using any unit testing framework:
from pytest import approx
def test_name_email_match_scorer():
assert approx(0.60) == name_email_match_scorer(
"katherine.johnson@nasa.gov",
"Katherine Johnson",
)
assert approx(0.39) == name_email_match_scorer(
"katherine.johnson@nasa.gov",
"Eleanor Roosevelt",
)
Here, the
Jaccard index
for katherine.johnson@nasa.gov
is higher with the name
Katherine Johnson
than with the name Eleanor Roosevelt
,
as expected.
Now that the unit tests have passed, you can create a Branch Deploy with the new changes.
Chalk allows you to create an unlimited number of branch deployments. Branch deployments run all of your resolvers in the same way that they run in production. However, branch deployments don’t impact the offline store. You can create a branch deployment with the following command:
chalk apply --branch test
With the branch deployment created, you can run integration tests on your changes.
import pytest
from chalk.client import ChalkClient
@pytest.fixture(scope="module")
def chalk_client():
client = ChalkClient(local=True)
return client
def test_get_features(chalk_client):
resp = chalk_client.query(
input={User.id: 1},
output=[
User.credit_report.fico_score,
User.email,
User.name,
User.dob,
],
)
assert resp.get_feature_value(User.credit_report.fico_score) == 700
assert resp.get_feature_value(User.email) == "katherine.johnson@nasa.gov"
Here, the Chalk API client is configured to use the preview deployment id returned in the previous step.
Once you’ve tested your changes, it’s time to deploy!
This step looks much like the preview deployment,
but this time without the --branch
flag:
chalk apply
Now, your production environments can request the new
user.name_email_match_score
feature.
chalk query --in user.id=1 --out user.name_email_match_score
The user.name_email_match_score
feature is live!
But historically, this feature did not exist,
and you won’t be able to sample its values at previous times
until you backfill the values, which you can do with the ‘chalk trigger’ command.
chalk trigger --resolver example.name_email_match_scorer --lower-bound 2020-05-05T12:00:00+00:00 --persist-offline=True
This command will backfill historical values for the user.name_email_match_score
feature.
With the feature backfilled, you can query the historical value:
That’s how you deploy a new feature with Chalk! Let a member of our technical team know if we can be helpful.
Mistakes are inevitable, but Chalk provides the tools you need to quickly and easily recover and keep going.
If you need to change a feature’s type (for example from string
to int
), or if you want to drop all the data for a feature value, the Chalk CLI has you covered with chalk drop
. Simply execute the command from the CLI and you’ll have a fresh start to recreate the feature and its data as necessary.
Sometimes you just need to fully remove a record from your systems, whether because of a GDPR mandated “right to forget” request or due to a business requirement. Chalk provides the chalk delete
command to meet this need.