Chalk supports AWS Athena as a SQL source. This allows users to load data from AWS Glue and other AWS data sources (Hive, DocumentDB, Iceberg, etc.) directly into Chalk features.


Adding Athena

By navigating to Integrations > Add a data source and selecting Athena, you’ll find a form where you can input information about your Athena integration. Note that we will perform data unload operations to the provided staging directory in S3: these intermediate results will appear under the chalk-unload folder.


Authorization

Chalk uses the IAM Workload role defined in your cluster deployment to access your Athena data sources. To enable access to your AWS resources, you will need to ensure that the IAM Role defined in your cluster deployment has the necessary permissions to access AWS Athena. You can view the associated IAM role when setting up your integration.


Integrations Setup

After configuring your Athena integration in the dashboard, define your data sources in Python:

from chalk.sql import AthenaSource

athena_source_txns = AthenaSource(name="ATHENA_TRANSACTIONS")
athena_source_marketing = AthenaSource(name="ATHENA_MARKETING")

Note that all queries to Athena will be run with UNLOAD to handle larger-than-memory datasets.

Then reference them in SQL file resolvers using the name parameter. For example, to query from the ATHENA_TRANSACTIONS source:

-- type: online
-- resolves: User
-- source: ATHENA_TRANSACTIONS
SELECT id, transaction_volume FROM transactions

And to query from the ATHENA_MARKETING source:

-- type: online
-- resolves: User
-- source: ATHENA_MARKETING
SELECT id, email, campaign_status FROM marketing_data