Chalk Dataplane Installation (Background Persistence)

Chalk uses background writers hosted in the (“Customer Cloud”) Kubernetes cluster to write information about queries to various storage locations.

Prerequisites

In order to install Chalk persistence writers, you need to have the following:

Namespace to deploy the database into

If using Kafka:

Kafka brokers
Kafka topics for each bus
Kafka authentication secret stored in AWS Secrets Manager

If using Pubsub:

Topics for each bus
Subscriptions for each bus

Creating a Background Persistence Writers

Navigate to the Settings/Team/Shared Resources/Background Persistence page in the Chalk UI to view the background persistence configuration. If no background persistence is configured, you will see a message indicating that no background persistence is currently present, and the first save and apply will create background persistence writers.

Writer Types

Chalk supports different types of background persistence writers, each designed for specific data flow and storage purposes:

Online Store Writers

ONLINE_WRITER: Listens to the online store subscription and writes query results into the online store (Redis, DynamoDB, etc.). This ensures that computed feature values are persisted for fast retrieval during online inference.

Metrics and Monitoring Writers

METRICS_WRITER: Listens to the result metrics subscription and republishes messages as metrics bus messages. This writer transforms query result data into metrics format for downstream processing.
METRICS_BUS_WRITER: Listens to the metrics bus subscription and writes processed metrics to the metrics store. This enables monitoring and observability of feature computation performance.

Offline Store Writers

OFFLINE_WRITER: Listens to the offline store subscription and writes query results into the offline store (BigQuery, Snowflake, etc.). This provides historical data for training and batch inference.
OFFLINE_STORE_BULK_INSERT_WRITER: Listens to the bulk upload subscription and performs efficient bulk inserts of parquet files from cloud storage (GCS/S3) into the offline store using COPY INTO operations.
OFFLINE_STORE_STREAMING_INSERT_WRITER: Listens to the streaming write subscription and handles real-time data insertion. The message body contains parquet-encoded data with table metadata. While the subscription name references BigQuery, this writer supports all offline store backends. For BigQuery, the more efficient bigquery-streaming-write-loader is typically used instead.

Specialized Writers

USAGE_BUS_WRITER: Listens to the usage events subscription and writes usage data to the usage events store (typically BigQuery). This tracks feature computation costs and usage patterns.
QUERY_MIRROR_CONSUMER: Listens to the query log subscription and executes mirrored queries. This is useful for debugging, testing, and validation scenarios.
ONLINE_VECTOR_STORE_WRITER: Listens to the vector store subscription and sends batches to vector databases for similarity search and retrieval use cases.

Each writer type requires specific subscription IDs and topics to be configured in the common persistence specifications.

Pubsub vs Kafka

When using pubsub, topics and subscriptions are 2 separate entities, but for Kafka, we use the same topic for both publishing and subscribing. Additionally, we need to provide Kafka authentication credential, whereas pubsub uses its google identity to authenticate.

Configuration Options (Common)

In the JSON format, these fields are in the common_specs field.

Attributes

bus_backendstring

The backend to use for the bus. "KAFKA" for AWS, "PUBSUB" for GCP.

namespacestring

The namespace to deploy the background persistence writers in.

service_account_namestring

The service account to use for the background persistence writers.

secret_clientstring

The client to use for secrets. "AWS" for AWS Secrets Manager, "GCP" for GCP Secrets Manager.

kafka_dlq_topicstring

The topic to use for Kafka dead letter queue.

Configuration Options (Other)

Attributes

api_server_hoststring

The hostname of the API server.

kafka_sasl_secretstring

The cloud secret to use for Kafka authentication.

kafka_bootstrap_serversstring

The Kafka bootstrap servers to use as a comma separated list.

kafka_security_protocolstring

The Kafka security protocol to use.

kafka_sasl_mechanismstring

The Kafka SASL mechanism to use.

redis_is_clusteredstring

Whether Redis is clustered or not, if using a Redis online store.

snowflake_storage_integration_namestring

The name of the Snowflake storage integration.

metadata_providerstring

The metadata provider to use ("GRPC_SERVER")

Configuration Options (Writers)

Attributes

namestring

The name of the writer.

bus_subscriber_typestring

The type of bus subscriber to use.

requestobject

The resource requests for the writer.

limitobject

The resource limits for the writer.

versionstring

The version of the writer.

default_replica_countint

The default number of replicas to create for the writer.

Configuration Options (Shared Writer Fields)

In the JSON format, these fields are in the common_specs field but are not necessarily required. Writers will each require an image and some, but not all, of the subscription and topic ID’s.

In each writer’s specification form, a writer will ask for its required fields and images.

Attributes

bus_writer_image_gostring

The docker image to use for Go bus writers.

bus_writer_image_pythonstring

The docker image to use for Python bus writers.

bus_writer_image_bswlstring

The docker image to use for bigquery streaming bus writers.

bigquery_parquet_upload_subscription_idstring

The subscription ID used to read parquet files to into the offline store.

bigquery_streaming_write_subscription_idstring

The subscription ID to use for streaming writes to the offline store.

bigquery_streaming_write_topicstring

' The topic to use for streaming writes to the offline store.

bigquery_upload_bucketstring

The S3 bucket to use for uploading files to the offline store.

bigquery_upload_topicstring

The topic to use for uploading files to the offline store.

metrics_bus_subscription_idstring

The subscription ID to use for metrics bus.

metrics_bus_topic_idstring

The topic ID to use for metrics bus.

result_bus_metrics_subscription_idstring

The subscription ID to use for result bus metrics.

result_bus_offline_store_subscription_idstring

The subscription ID to use for offline store result bus.

result_bus_online_store_subscription_idstring

The subscription ID to use for online store result bus.

Example Configuration

The following is an example configuration for background persistence writers:

{
  "common_persistence_specs": {
    "bus_backend": "KAFKA",
    "bus_writer_image_go": "<go bus writer image>",
    "bus_writer_image_python": "<python bus writer image>",
    "bus_writer_image_bswl": "<bswl bus writer image>",
    "namespace": "background-persistence",
    "service_account_name": "background-persistence-sa",
    "secret_client": "AWS",
    "bigquery_parquet_upload_subscription_id": "offline-store-bulk-insert-bus-1",
    "bigquery_streaming_write_subscription_id": "offline-store-streaming-insert-bus-1",
    "bigquery_streaming_write_topic": "offline-store-streaming-insert-bus-1",
    "bigquery_upload_bucket": "s3://<your data bucket>",
    "bigquery_upload_topic": "offline-store-bulk-insert-bus-1",
    "metrics_bus_subscription_id": "metrics-bus-1",
    "metrics_bus_topic_id": "metrics-bus-1",
    "result_bus_metrics_subscription_id": "result-bus-1",
    "result_bus_offline_store_subscription_id": "result-bus-1",
    "result_bus_online_store_subscription_id": "result-bus-1",
    "kafka_dlq_topic": "dlq-1",
    "operation_subscription_id": "operation-bus-1"
  },
  "api_server_host": "<your api server here>",
  "kafka_sasl_secret": "<your aws kafka auth secret here>",
  "kafka_bootstrap_servers": "<bootstrap server1>:<port>, <bootstrap server2>:<port>, ...",
  "kafka_security_protocol": "SASL_SSL",
  "kafka_sasl_mechanism": "SCRAM-SHA-512",
  "redis_is_clustered": "1",
  "snowflake_storage_integration_name": "<snowflak integration name>",
  "metadata_provider": "GRPC_SERVER",
  "writers": [
    {
      "name": "go-metrics-bus-writer",
      "bus_subscriber_type": "GO_METRICS_BUS_WRITER",
      "request": {
        "cpu": "200m",
        "memory": "512Mi"
      },
      "limit": {
        "cpu": "1",
        "memory": "512Mi"
      },
      "version": "1.0",
      "default_replica_count": 1
    },
    {
      "name": "go-result-bus-metrics-writer",
      "bus_subscriber_type": "GO_RESULT_BUS_METRICS_WRITER",
      "request": {
        "cpu": "400m",
        "memory": "1024Mi"
      },
      "limit": {
        "cpu": "1",
        "memory": "1024Mi"
      },
      "version": "1.0",
      "default_replica_count": 1
    }
  ]
}

​Prerequisites

​Creating a Background Persistence Writers

​Writer Types

​Online Store Writers

​Metrics and Monitoring Writers

​Offline Store Writers

​Specialized Writers

​Pubsub vs Kafka

​Configuration Options (Common)

​Configuration Options (Other)

​Configuration Options (Writers)

​Configuration Options (Shared Writer Fields)

​Example Configuration

On this page

Prerequisites

Creating a Background Persistence Writers

Writer Types

Online Store Writers

Metrics and Monitoring Writers

Offline Store Writers

Specialized Writers

Pubsub vs Kafka

Configuration Options (Common)

Configuration Options (Other)

Configuration Options (Writers)

Configuration Options (Shared Writer Fields)

Example Configuration