# Chalk Dataplane Installation (Background Persistence)
source: https://docs.chalk.ai/docs/background-persistence-installation

## Background Persistence Installation Via the UI

Chalk uses background writers hosted in the ("Customer Cloud") Kubernetes cluster
to write information about queries to various storage locations.

### Prerequisites

In order to install Chalk persistence writers, you need to have the following:

- Namespace to deploy the database into

If using Kafka:

- Kafka brokers
- Kafka topics for each bus
- Kafka authentication secret stored in AWS Secrets Manager

If using Pubsub:

- Topics for each bus
- Subscriptions for each bus

### Creating a Background Persistence Writers

Navigate to the Settings > Team > Shared Resources > Background Persistence page in the Chalk UI
to view the background persistence configuration. If no background persistence is configured,
you will see a message indicating that no background persistence is currently present, and the
first save and apply will create background persistence writers.

### Writer Types

Chalk supports different types of background persistence writers, each designed for specific data flow and storage purposes:

### Online Store Writers

- ONLINE_WRITER: Listens to the online store subscription and writes query results into the online store (Redis, DynamoDB, etc.). This ensures that computed feature values are persisted for fast retrieval during online inference.

### Metrics and Monitoring Writers

- METRICS_WRITER: Listens to the result metrics subscription and republishes messages as metrics bus messages. This writer transforms query result data into metrics format for downstream processing.
- METRICS_BUS_WRITER: Listens to the metrics bus subscription and writes processed metrics to the metrics store. This enables monitoring and observability of feature computation performance.

### Offline Store Writers

- OFFLINE_WRITER: Listens to the offline store subscription and writes query results into the offline store (BigQuery, Snowflake, etc.). This provides historical data for training and batch inference.
- OFFLINE_STORE_BULK_INSERT_WRITER: Listens to the bulk upload subscription and performs efficient bulk inserts of parquet files from cloud storage (GCS/S3) into the offline store using COPY INTO operations.
- OFFLINE_STORE_STREAMING_INSERT_WRITER: Listens to the streaming write subscription and handles real-time data insertion. The message body contains parquet-encoded data with table metadata. While the subscription name references BigQuery, this writer supports all offline store backends. For BigQuery, the more efficient bigquery-streaming-write-loader is typically used instead.

### Specialized Writers

- QUERY_MIRROR_CONSUMER: Listens to the query log subscription and executes mirrored queries. This is useful for debugging, testing, and validation scenarios.
- ONLINE_VECTOR_STORE_WRITER: Listens to the vector store subscription and sends batches to vector databases for similarity search and retrieval use cases.

Each writer type requires specific subscription IDs and topics to be configured in the common persistence specifications.

### Pubsub vs Kafka

When using pubsub, topics and subscriptions are 2 separate entities, but for Kafka, we use the same
topic for both publishing and subscribing. Additionally, we need to provide Kafka authentication
credential, whereas pubsub uses its google identity to authenticate.

### Configuration Options (Common)

In the JSON format, these fields are in the common_specs field.

The backend to use for the bus. "KAFKA" for AWS, "PUBSUB" for GCP.

The namespace to deploy the background persistence writers in.

The service account to use for the background persistence writers.

The client to use for secrets. "AWS" for AWS Secrets Manager, "GCP" for GCP Secrets Manager.

The topic to use for Kafka dead letter queue.

### Configuration Options (Other)

The hostname of the API server.

The cloud secret to use for Kafka authentication.

The Kafka bootstrap servers to use as a comma separated list.

The Kafka security protocol to use.

The Kafka SASL mechanism to use.

Whether Redis is clustered or not, if using a Redis online store.

The name of the Snowflake storage integration.

The metadata provider to use ("GRPC_SERVER")

### Configuration Options (Writers)

The name of the writer.

The type of bus subscriber to use.

The resource requests for the writer.

The resource limits for the writer.

The version of the writer.

The default number of replicas to create for the writer.

### Configuration Options (Shared Writer Fields)

In the JSON format, these fields are in the common_specs field but are not necessarily required.
Writers will each require an image and some, but not all, of the subscription and topic ID's.

In each writer's specification form, a writer will ask for its required fields and images.

The docker image to use for Go bus writers.

The docker image to use for Python bus writers.

The docker image to use for bigquery streaming bus writers.

The subscription ID used to read parquet files to into the offline store.

The subscription ID to use for streaming writes to the offline store.

The topic to use for streaming writes to the offline store.

The S3 bucket to use for uploading files to the offline store.

The topic to use for uploading files to the offline store.

The subscription ID to use for metrics bus.

The topic ID to use for metrics bus.

The subscription ID to use for result bus metrics.

The subscription ID to use for offline store result bus.

The subscription ID to use for online store result bus.

### Example Configuration

The following is an example configuration for background persistence writers:

```
{
  "common_persistence_specs": {
    "bus_backend": "KAFKA",
    "bus_writer_image_go": "<go bus writer image>",
    "bus_writer_image_python": "<python bus writer image>",
    "bus_writer_image_bswl": "<bswl bus writer image>",
    "namespace": "background-persistence",
    "service_account_name": "background-persistence-sa",
    "secret_client": "AWS",
    "bigquery_parquet_upload_subscription_id": "offline-store-bulk-insert-bus-1",
    "bigquery_streaming_write_subscription_id": "offline-store-streaming-insert-bus-1",
    "bigquery_streaming_write_topic": "offline-store-streaming-insert-bus-1",
    "bigquery_upload_bucket": "s3://<your data bucket>",
    "bigquery_upload_topic": "offline-store-bulk-insert-bus-1",
    "metrics_bus_subscription_id": "metrics-bus-1",
    "metrics_bus_topic_id": "metrics-bus-1",
    "result_bus_metrics_subscription_id": "result-bus-1",
    "result_bus_offline_store_subscription_id": "result-bus-1",
    "result_bus_online_store_subscription_id": "result-bus-1",
    "kafka_dlq_topic": "dlq-1",
    "operation_subscription_id": "operation-bus-1"
  },
  "api_server_host": "<your api server here>",
  "kafka_sasl_secret": "<your aws kafka auth secret here>",
  "kafka_bootstrap_servers": "<bootstrap server1>:<port>, <bootstrap server2>:<port>, ...",
  "kafka_security_protocol": "SASL_SSL",
  "kafka_sasl_mechanism": "SCRAM-SHA-512",
  "redis_is_clustered": "1",
  "snowflake_storage_integration_name": "<snowflak integration name>",
  "metadata_provider": "GRPC_SERVER",
  "writers": [
    {
      "name": "go-metrics-bus-writer",
      "bus_subscriber_type": "GO_METRICS_BUS_WRITER",
      "request": {
        "cpu": "200m",
        "memory": "512Mi"
      },
      "limit": {
        "cpu": "1",
        "memory": "512Mi"
      },
      "version": "1.0",
      "default_replica_count": 1
    },
    {
      "name": "go-result-bus-metrics-writer",
      "bus_subscriber_type": "GO_RESULT_BUS_METRICS_WRITER",
      "request": {
        "cpu": "400m",
        "memory": "1024Mi"
      },
      "limit": {
        "cpu": "1",
        "memory": "1024Mi"
      },
      "version": "1.0",
      "default_replica_count": 1
    }
  ]
}
```

### Troubleshooting

If you are experiencing issues with background persistence writers, such as data not appearing
in your online or offline stores, writers running out of memory, or pods stuck in a pending state,
see the Debugging Persistence guide.