Feature Engine
Integrate with Snowflake.
Chalk supports Snowflake as a SQL source.
You can configure the Snowflake-specific options using the SnowflakeSource.__init__ args.
Alternately, you can configure the source through your dashboard.
In the dashboard, you can define the Snowflake Source along with some default configurations, such as the database, schema, and warehouse. If the role that you have provided has permissions to, you would be able to use the same Snowflake source to query other tables, but if not otherwise specified, Chalk will use the defaults set in the data source definition.
Add a Snowflake integration. These parameters will also be available as environment variables.

After configuring your Snowflake integration in the dashboard, define your data sources in Python:
from chalk.sql import SnowflakeSource
risk = SnowflakeSource(name="RISK")
marketing = SnowflakeSource(name="MARKETING")Then reference them in SQL file resolvers using the name parameter. For example, to query from the RISK source:
-- type: online
-- resolves: User
-- source: RISK
SELECT id, credit_score FROM usersAnd to query from the MARKETING source:
-- type: online
-- resolves: User
-- source: MARKETING
SELECT id, email, campaign_status FROM usersNamed integrations inject environment variables with the standard names prefixed by the integration name. For example, if your integration is called RISK, then the variable SNOWSQL_SCHEMA will be injected as RISK_SNOWSQL_SCHEMA.If Chalk has helped you configure a Snowflake Offline Store already, then some of these commands my overlap with work that is already done. Please review your Snowflake configuration for existing objects matching the ones below before proceeding.
For detailed information about Snowflake storage integrations and external stages, refer to:
To use Snowflake as a Chalk datasource, the user or role specified in your integration must have the following permissions:
USAGE on the warehouse, database, schema, and stage SELECT on all tables and views being queriedREAD on the stage WRITE on the stage CREATE TEMPORARY TABLE in the schemaDROP on temporary tables in the schemaChalk creates temporary tables during feature computation to push down filters to optimize your query loads. In addition, Chalk uses the stages to optimize data unloading.
Here’s an example of setting up a role with the appropriate permissions:
-- Set variables
SET ROLE_NAME='CHALK_ROLE';
SET WAREHOUSE_NAME='<warehouse_name>';
SET DB_NAME='<database_name>';
SET SCHEMA_NAME='<schema_name>';
SET STAGE_NAME='<stage_name>';
SET INTEGRATION_NAME='<integration_name>';
SET USER_NAME='<chalk_user>';
-- Derived variables
SET QUALIFIED_SCHEMA_NAME=concat($DB_NAME, '.', $SCHEMA_NAME);
-- Create a role for Chalk
CREATE ROLE IF NOT EXISTS IDENTIFIER($ROLE_NAME);
-- Grant warehouse permissions
GRANT USAGE ON WAREHOUSE IDENTIFIER($WAREHOUSE_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
-- Grant database/schema permissions
GRANT USAGE ON DATABASE IDENTIFIER($DB_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
GRANT USAGE ON SCHEMA IDENTIFIER($QUALIFIED_SCHEMA_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
GRANT SELECT ON ALL TABLES IN SCHEMA IDENTIFIER($QUALIFIED_SCHEMA_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
-- Grant temporary table permissions
GRANT CREATE TEMPORARY TABLE ON SCHEMA IDENTIFIER($QUALIFIED_SCHEMA_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
-- Grant storage integration and stage permissions (if using data unloading)
GRANT USAGE ON INTEGRATION IDENTIFIER($INTEGRATION_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
GRANT USAGE ON STAGE IDENTIFIER($STAGE_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
GRANT READ, WRITE ON STAGE IDENTIFIER($STAGE_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
-- Assign role to user
GRANT ROLE IDENTIFIER($ROLE_NAME) TO USER IDENTIFIER($USER_NAME);Chalk can optimize large query result transfers by unloading data to a Snowflake stage and reading it from cloud storage. This is particularly useful for large feature computations. Using a storage integration with IAM authentication is the recommended approach as it avoids managing long-lived credentials.
-- Set variables (customize these for your environment)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET AWS_ROLE_ARN='arn:aws:iam::<aws_account_id>:role/chalk-offline-store-access-role';
SET S3_BUCKET='s3://chalk-data-bucket/';
-- Create storage integration if it doesn't exist
CREATE STORAGE INTEGRATION IF NOT EXISTS IDENTIFIER($INTEGRATION_NAME)
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
STORAGE_AWS_ROLE_ARN = $AWS_ROLE_ARN
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = ($S3_BUCKET);
-- Verify integration and get Snowflake credentials
DESCRIBE INTEGRATION IDENTIFIER($INTEGRATION_NAME);From the DESCRIBE INTEGRATION output, record these values for Step 4:
STORAGE_AWS_IAM_USER_ARN - The Snowflake IAM user ARNSTORAGE_AWS_EXTERNAL_ID - The external ID for AWS trust policyThis role will be assumed by Snowflake to access the S3 bucket.
Follow the Snowflake IAM Role documentation to create a role named chalk-{organization}-offline-store-access-role.
When setting the trust policy, use a temporary placeholder with your AWS account ID and external ID "0000". You will update this with actual Snowflake credentials in Step 4.
Attach an IAM policy that grants S3 permissions to this role (see Snowflake documentation for policy details).
-- Set variables (customize these for your environment)
SET STAGE_NAME='CHALK_UNLOAD_STAGE';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET S3_URL='s3://chalk-data-bucket/unload/';
-- Create external stage using storage integration
CREATE STAGE IF NOT EXISTS IDENTIFIER($STAGE_NAME)
STORAGE_INTEGRATION = IDENTIFIER($INTEGRATION_NAME)
URL = $S3_URL;Update the trust policy for the AWS IAM role chalk-{organization}-offline-store-access-role with the actual Snowflake credentials from Step 1:
STORAGE_AWS_IAM_USER_ARN valueSTORAGE_AWS_EXTERNAL_ID valueSee the Snowflake trust policy documentation for detailed instructions.
-- Set variables
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET STAGE_NAME='CHALK_UNLOAD_STAGE';
-- Grant usage on storage integration
GRANT USAGE ON INTEGRATION IDENTIFIER($INTEGRATION_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
-- Grant stage permissions
GRANT USAGE ON STAGE IDENTIFIER($STAGE_NAME) TO ROLE IDENTIFIER($ROLE_NAME);
GRANT READ, WRITE ON STAGE IDENTIFIER($STAGE_NAME) TO ROLE IDENTIFIER($ROLE_NAME);Add the stage reference to your environment variables in the Chalk dashboard. Go to Settings > Variables and add:
CHALK_SNOWFLAKE_UNLOAD_STAGE={NAME_OF_YOUR_STAGE}
Chalk will use this stage to unload and transfer large result sets efficiently, improving performance for large feature computations.