Chalk home page
Docs
API
CLI
  1. Updates
  2. Changelog

Improvements to Chalk are published here! See our public roadmap for upcoming changes.


May 10, 2023

Enhancements to Offline Query

The Offline Query has been enhanced with a new recompute_features parameter. Users can control which features are sampled from the offline store, and which features are recomputed.

  • The default value False will maintain current behavior, returning only samples from the offline store.
  • True will ignore the offline store, and execute @online and @offline resolvers to produce the requested output.
  • If, instead, the user passes in a list of features to recompute_features, those features will be recomputed by running @online and @offline resolvers, and all other feature values - including those needed to recompute the requested features - will be sampled from the offline store.

Recompute Dataset

The ‘recompute’ capability is also exposed on Dataset. When passed a list of features to recompute, a new Dataset Revision will be generated, and the existing dataset will be used as inputs to recompute the requested features.

Developing in Jupyter

Chalk has introduced a new workflow when (working with branches)[/docs/branches], allowing full iterations to take place directly in any IPython notebook. When a user creates a Chalk Client with a branch in a notebook, subsequent features and resolvers in the notebook will be deployed to that branch. When combined with Recompute Dataset and the enhancements to Offline Query, users have a new development loop available for feature exploration and development:

  1. Take advantage af existing data in chalk
  2. Explore that data using familiar tools in a notebook
  3. Enrich the data by developing new features and resolvers
  4. Immediately view the results of adjusting features in the dataset
  5. When exploration is complete, features and resolvers can be directly added back to the Chalk project

May 5, 2023

View Deployment Source Code

Deployments now offer the ability to view their source code. By clicking the “View Source” button on the Deployment Detail page, users can view all files included in the deployed code.

April 21, 2023

Improved Deployment Utilities

Users can now “redeploy” any historical deployement with a UI button on the deployment details page. This enables useful workflows including rollbacks. The “download source” button downloads a tarball containing the deployed source to your local machine. Deploy UI Enhancements

April 18, 2023

Resolver error messages for incorrect types include primary keys

When writing resolvers, incorrect typing can be a difficult to track. Now, if a resolver instantiates a feature of an incorrect type, the resolver error message will include the primary key value(s) of the query itself.

April 11, 2023

Online query improvements

The Online Query API can now be used to query DataFrame-typed features. For instance, you can query all of a user’s transaction level features in a single query:

chalk query --in user.id --out user.transactions

{
  "columns": ["transaction.id", "transaction.user_id", ...],
  "values": [[1, 2, 3, ...], ["user_1", "user_2", "user_3", ...]
}

More functionality will be added to Online and Offline query APIs to support more advanced query patterns.

April 6, 2023

Branch deployments

When deploying with chalk apply a new flag --branch <branch_name> has been introduced which creates a branch deployment. Users can interact with their branch deployment using a consistent name by passing the branch name to query, upload_features, etc. Chalk clients can also be scoped to a branch by passing the branch in the constructor. Branch deployments are many times faster than other flavors of chalk apply, frequently taking only a few seconds from beginning to end. Branch deployments replace preview deploys, which have been deprecated.

March 31, 2023

Speed improvements for deployments

Deployments via chalk apply are now up to 50% faster in certain cases. If your project’s PIP dependencies haven’t changed, new deployments will build & become active significantly faster than before.

Deploy Time Comparison:

March 17, 2023

Offline TTL

Introduces a new “offline_ttl” property to features decorator . Now you can control for how long data is valid in the offline_store. Any feature older than the ttl value will not be returned in an offline query.

@features
class MaxOfflineTTLFeatures:
    id: int
    ts: datetime = feature_time()

    no_offline_ttl_feature: int = feature(offline_ttl=timedelta(0))
    one_day_offline_ttl_feature: int = feature(offline_ttl=timedelta(days=1))
    infinite_ttl_feature: int

Strict Feature Validation

Adds the strict property to features decorator, indicating that any failed validation will throw an error. Invalid features will never be written to the online or offline store is strict is True. Also introduces the validations array to allow differentiated strict and soft validations on the same feature.

@features
class ClassWithValidations:
    id: int
    name: int = feature(max=100, min=0, strict=True)
    feature_with_two_validations: int = feature(
        validations=[
            Validation(min=70, max=100),
            Validation(min=0, max=100, strict=True),
        ]
    )

March 7, 2023

Datasets in Offline Query

The Dataset class is now live! Using the new ChalkClient.offline_query method, we can inspect important metadata about the query and retrieve its output data in a variety of ways.

Simply attach a dataset_name to the query to persist the results.

from chalk.client import ChalkClient, Dataset
uids = [1, 2, 3, 4]
at = datetime.now()
dataset: Dataset = ChalkClient().offline_query(
     input={
         User.id: uids,
         User.ts: [at] * len(uids),
     },
     output=[
         User.id,
         User.fullname,
         User.email,
         User.name_email_match_score,
     ],
     dataset_name='my_dataset'
)
pandas_df: pd.DataFrame = dataset.data_as_pandas

Check out the documentation here.

February 28, 2023

Deployment Build Logs

Chalk now provides access to build and boot logs through the Deployments page in the dashboard.

Build Logs

February 16, 2023

Resolver timeouts

Computing features associated with third-party services can be unpredictably slow. Chalk helps you manage such uncertainty by specifying a resolver timeout duration.

Now you can set timeouts for resolvers!

@online(timeout="200ms")
def resolve_australian_credit_score(driver_id: User.driver_id_aus) -> User.credit_score_aus:
    return experian_client.get_score(driver_id)

January 26, 2023

SQL File Resolvers

SQL-integrated resolvers can be completely written in SQL files: no Python required! If you have a SQL source like as follows:

pg = PostgreSQLSource(name='PG')

You can define a resolver in a .chalk.sql file, with comments that detail important metadata. Chalk will process it upon chalk apply as it would any other Python resolver.

get_user.chalk.sql
-- type: online
-- resolves: user
-- source: PG
-- count: 1
select email, full_name from user_table where id=${user.id}

Check out the documentation here.

January 12, 2023

Improved Logging

Logging on your dashboard has been improved. You can now scroll through more logs, and the formatting is cleaner and easier to use. This view is available for resolvers and resolver runs.

Logs Viewer

January 9, 2023

Pretty Print Online Query Results

Online Query Response objects now support pretty-print in any iPython environment.

Pretty Print Query Response

January 8, 2023

Linux docker containers on M1 Macs

chalkpy has always supported running in docker images using M1’s native arm64 architecture, and now chalkpy==1.12.0 supports most functionality on M1 Macs when run with AMD64 (64 bit Linux) architecture docker images. This is helpful when testing images built for Linux servers that include chalkpy.

January 6, 2023

Chalk has lots of documentation, and finding content is now difficult.

We’ve added docs search!

Documentation search

Try it out by typing cmd-K, or clicking the search button at the top of the table of contents.

September 27, 2022

Tags & Owners as Comments

This update makes several improvements to feature discovery.

Tags and owners are now parsed from the comments preceding the feature definition.

@features
class RocketShip:
    # :tags: team:identity, priority:high
    # :owner: katherine.johnson@nasa.gov
    velocity: float
    ...

Prior to this update, owners and tags needed to be set in the feature(...) function:

@features
class RocketShip:
    velocity: float = feature(
        tags=["team:identity", "priority:high"],
        owner="katherine.johnson@nasa.gov"
    )
    ...

Feel free to choose either mechanism!

July 28, 2022

Auto Id Features

It’s natural to name the primary feature of a feature set id. So why do you always have to specify it? Until now, you needed to write:

@features
class User:
    id: str = feature(primary=True)
    ...

Now you don’t have to! If you have a feature class that does not have a feature with the primary field set, but has a feature called id, it will be assigned primary automatically:

@features
class User:
    id: str
    ...

The functionality from before sticks around: if you use a field as a primary key with a name other than id, you can keep using it as your primary feature:

@features
class User:
    user_id: str = feature(primary=True)
    # Not really the primary key!
    id: str

July 25, 2022

DataFrame Expressions

The Chalk DataFrame now supports boolean expressions! The Chalk team has worked hard to let you express your DataFrame transformations in natural, idiomatic Python:

DataFrame[
  User.first_name == "Eleanor" or (
    User.email == "eleanor@whitehouse.gov" and
    User.email_status not in {"deactivated", "unverified"}
  ) and User.birthdate is not None
]

Python experts will note that or, and, is, is not, not in, and not aren’t overload-able. So how did we do this? The answer is AST parsing! A more detailed blog post to follow.

July 22, 2022

Descriptions as Comments

This update makes several improvements to feature discovery.

Descriptions are now parsed from the comments preceding the feature definition. For example, we can document the feature User.fraud_score with a comment above the attribute definition:

@features
class User:
    # 0 to 100 score indicating an identity match.
    # Low scores indicate safer users
    fraud_score: float
    ...

Prior to this update, descriptions needed to be set in the feature(...) function:

@features
class UserFeatures:
    fraud_score: float = feature(description="""
           0 to 100 score indicating an identity match.
           Low scores indicate safer users
        """)
    ...

The description passed to feature(...) takes precedence over the implicit comment description.

Namespace Metadata

You can now set attributes for all features in a namespace!

Here, we assign the tag group:risk and the owner ravi@chalk.ai to all features on the feature class. Owners specified at the feature level take precedence (so the owner of User.email is the default ravi@chalk.ai whereas the owner of User.flaky_api_result is devops@chalk.ai). Tags aggregate, so email has the tags pii and group:risk.

@features(tags="group:risk", owner="ravi@chalk.ai")
class User:
    email: str = feature(tags="pii")
    flaky_api_result: str = feature(owner="devops@chalk.ai")

July 14, 2022

Self-Serve Slack Integration

You can configure Chalk to post message to your Slack workspace! You can find the Slack integration tab in the settings page of your dashboard.

Slack integration

Slack can be used as an alert channel or for build notifications.

July 13, 2022

Python 3.8 Support

Chalk’s pip package now supports Python 3.8! With this change, you can use the Chalk package to run online and offline queries in a Python environment with version >= 3.8. Note that your features will still be computed on a runtime with Python version 3.10.

July 8, 2022

Named Integrations

Chalk’s injects environment variables to support data integrations. But what happens when you have two data sources of the same kind? Historically, our recommendation was to create one set of environment variables through an official data source integration, and one set of prefixed environment variables yourself using the generic environment variable support.

With the release of named integrations, you can connect to as many of the same data source as you need! Provide a name at the time of configuring your data source, and reference it in the code directly. Named integrations inject environment variables with the standard names prefixed by the integration name (ie. RISK_PGPORT). The first integration of a given kind will also create the un-prefixed environment variable (ie. both PGPORT and RISK_PGPORT).

June 29, 2022

SOC 2 Report

Chalk is excited to announce the availability of our SOC 2 Type 1 report from Prescient Assurance. Chalk has instituted rigorous controls to ensure the security of customer data and earn the trust of our customers, but we’re always looking for more ways to improve our security posture, and to communicate these steps to our customers. This report is one step along our ongoing path of trust and security.

If you’re interested in reviewing this report, please contact support@chalk.ai to request a copy.

June 3, 2022

Pandas Integration

You can now convert Chalk’s DataFrame to a pandas.DataFrame and back! Use the methods chalk_df.to_pandas() and .from_pandas(pandas_df).

Migration Sampling

The 1.4.1 release of the CLI added a parameter --sample to chalk migrate. This flag allows migrations to be run targeting specific sample sets.

Feature/Resolver Health

Added spark lines to the feature and resolver tables which show a quick summary of request counts over the past 24 hours. Added status to feature and resolver tables which show any failing checks related to a feature or resolver.