Your first time with Chalk
Chalk’s platform is a powerful tool that allows data engineers and data scientists to collaborate efficiently and effectively.
In this project we’ll create features related to Users
and their credit scores. We’ll combine data from an
API and a database to help us decide whether we should issue
new loans.
You will learn how to create a new Chalk project, create features and resolvers, deploy them to the Chalk environment, and query the environment using the Chalk CLI and a Jupyter Notebook.
This tutorial will take about 45 minutes to complete end-to-end.
code
CLI command.# Install Homebrew
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Use Homebrew to install python3
$ brew install python
If you’re having trouble installing Python, follow the instructions in Python’s docs.
$ curl -s -L https://api.chalk.ai/install.sh | sh
Now that all of your software is installed, let’s create a Chalk project where you will add your features and resolvers.
chalk-tutorial
.chalk init
from inside the chalk-tutorial
to initialize your project files.
This will create two files for you: chalk.yaml
and .chalkignore
.
The first file, chalk.yaml
, contains configuration information about
your project. The second file, .chalkignore
, tells your
Chalk CLI tool which files to ignore when deploying
to your Chalk environment. You can edit this file and use it
just like a .gitignore
file.$ mkdir chalk-tutorial
$ cd chalk-tutorial
$ chalk init
Created project config file chalk.yaml
Created .chalkignore file
chalk.yaml
file and set the project
field to the
name of your Chalk project. This might be chalk-tutorial
, but check
by visiting the Projects page.chalk login
.
If you’re using a dedicated environment,
make sure you use the --api-host
flag.
Type y
when prompted and login in the browser.Now you’re ready to deploy code and query the environment!
We’re going to create a virtual environment so that you can download and use Python libraries in your Chalk project.
deactivate
.$ python3 -m venv chalk-tutorial-venv
$ source chalk-tutorial-venv/bin/activate
requirements.txt
file tells your virtual environment
(and Chalk) which libraries to install. We’ll use a few
libraries for this tutorial, so add the following to the
requirements.txt
. Feel free to add more libraries later.chalkpy
pydantic
requests
$ pip3 install -r requirements.txt
requests
in Python. If there’s no error message, it
worked. Exit by typing exit()
and hitting enter.$ python3
>>> import requests
>>> exit()
Now that all the software is installed and ready to use, you’re ready to start integrating an API.
$ curl https://credit-report.chalk.dev/rutter_score/12
{"score":98}
Now you should try to hit this API from Python using a Jupyter Notebook.
File -> New File
.
Select Jupyter Notebook
.chalk-tutorial-venv
).shift+enter
.import requests
requests.get(
"https://credit-report.chalk.dev/rutter_score/12"
).json()["score"]
If you see 98, congratulations! You’re all setup and ready to start developing your Chalk project.
With all the pre-requisites completed, let’s jump right into building your Chalk project.
We’ll create a feature called User
with a “Rutter score”
property. The resolver will call the “Rutter” API we tested
earlier, passing the User ID, and return the score.
After you deploy the feature, you will test it in
Jupyter.
credit.py
file and copy in the feature and
resolver definitions below:import requests
from chalk.features import online, features, Features
@features
class User:
id: str
credit_score: int
@online
def get_credit_score(
id: User.id,
) -> Features[User.credit_score]:
return requests.get(
f"https://credit-report.chalk.dev/rutter_score/{id}"
).json()["score"]
--branch
flag with a name
so that you don’t affect the “actual” environmentchalk apply --branch test
from credit import User
from chalk.client import ChalkClient
ChalkClient().query(
input={User.id: 'u_F6zY0tE4w8'},
output=[User.credit_score],
branch="test"
)
You can also query for this feature in your terminal:
$ chalk query --in user.id=u_F6zY0tE4w8 \
--out user.credit_score \
--branch test
Congratulations! You’ve written your first Chalk feature, connected it to an external API, and queried it from a Jupyter Notebook. This is a big achievement!
We have a database containing more information about users, and it would be good to combine that information together on our user feature.
postgres
.psql -U postgres -p 5432 -h 35.188.7.252 -d postgres
postgres=> select * from users limit 10;
Notice that we know name
, surname
, email
, birthday
, and
is_fraud
, a flag that tells us whether we know this user
has committed fraud in the past.
Add a Data Source
and choose PostgreSQL
.postgres
.
User
feature class
you created before. The properties name
, surname
, and
email
are strings. Birthday is a date, and is_fraud
is a boolean. (Hint: you may have to add from datetime import date
to your credit.py file)credit.py
file. This tells Python about
your datasource.from chalk.sql import PostgreSQLSource
user_pg = PostgreSQLSource(name="User_PG")
users.chalk.sql
and copy the below contents:-- type: online
-- resolves: user
-- source: postgres
-- count: 1
select name, surname, email, birthday, is_fraud from users where id=${user.id}
--branch
with the name you specified, for example “test”),
query your feature. Supplying User
as the output
tells Chalk to give you back ALL the features of User
.res = client.query(
input={User.id: 'u_F6zY0tE4w8'},
output=[User],
branch="test"
)
You can also query for this feature in your terminal:
$ chalk query --in user.id=u_F6zY0tE4w8 \
--out user \
--deployment test