Enabling Replication

Adding replication to your ClickHouse® cluster

There are two ways to add replication support to your ClickHouse cluster: ClickHouse Keeper and Zookeeper. Here are the installation instructions for both:

When you install the operator, it creates four custom resource definitions (CRDs). We’ve already worked with the ClickHouseInstallation, but we also have a ClickHouseKeeperInstallation (abbreviation chk) that makes it easy to install ClickHouse Keeper.

Start by creating a namespace for ClickHouse Keeper:

kubectl create namespace keeper

Copy and paste the following into chk.yaml:

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: clickhouse-keeper
spec:
  configuration:
    clusters:
      - name: "chk01"
        layout:
          replicasCount: 3

  defaults:
    templates:
      # Templates are specified as default for all clusters
      podTemplate: default
      dataVolumeClaimTemplate: default

  templates:
    podTemplates:
      - name: default
        metadata:
          labels:
            app: clickhouse-keeper
          containers:
            - name: clickhouse-keeper
              imagePullPolicy: IfNotPresent
              image: "clickhouse/clickhouse-keeper:latest"
              resources:
                requests:
                  memory: "256M"
                  cpu: "1"
                limits:
                  memory: "4Gi"
                  cpu: "2"
          securityContext:
            fsGroup: 101

    volumeClaimTemplates:
      - name: default
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi

Now use kubectl apply to create the ClickHouseKeeperInstallation:

kubectl apply -f chk.yaml -n keeper

This creates the chk:

clickhousekeeperinstallation.clickhouse-keeper.altinity.com/clickhouse-keeper created

You’ll need to wait until state of the chk is Completed:

kubectl get chk -o wide -n keeper

It will take a minute, but you’ll see this at some point:

NAME                VERSION   CLUSTERS   SHARDS   HOSTS   TASKID   STATUS      HOSTS-UNCHANGED   HOSTS-UPDATED   HOSTS-ADDED   HOSTS-COMPLETED   HOSTS-DELETED   HOSTS-DELETE   ENDPOINT                                            AGE   SUSPEND
clickhouse-keeper   0.25.0    1          1        3                Completed                                                                                                    keeper-clickhouse-keeper.keeper.svc.cluster.local   25m

We’ll reference the chk resource in a YAML file to enable replication in our ClickHouse cluster. First, though, let’s make sure things are running:

kubectl get all -n keeper

You’ll see all the resources the operator created:

NAME                                    READY   STATUS    RESTARTS   AGE
pod/chk-clickhouse-keeper-chk01-0-0-0   1/1     Running   0          3m46s
pod/chk-clickhouse-keeper-chk01-0-1-0   1/1     Running   0          3m29s
pod/chk-clickhouse-keeper-chk01-0-2-0   1/1     Running   0          3m17s

NAME                                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
service/chk-clickhouse-keeper-chk01-0-0   ClusterIP   None         <none>        2181/TCP,9444/TCP   3m46s
service/chk-clickhouse-keeper-chk01-0-1   ClusterIP   None         <none>        2181/TCP,9444/TCP   3m29s
service/chk-clickhouse-keeper-chk01-0-2   ClusterIP   None         <none>        2181/TCP,9444/TCP   3m17s
service/keeper-clickhouse-keeper          ClusterIP   None         <none>        2181/TCP,9444/TCP   3m29s

NAME                                               READY   AGE
statefulset.apps/chk-clickhouse-keeper-chk01-0-0   1/1     3m46s
statefulset.apps/chk-clickhouse-keeper-chk01-0-1   1/1     3m29s
statefulset.apps/chk-clickhouse-keeper-chk01-0-2   1/1     3m17s

Adding a replica to our cluster

The spec for a ClickHouse Installation includes a zookeeper parameter. (Yes, we’re using ClickHouse Keeper, but the zookeeper parameter works for both.) Copy this text and save it in the file manifest03.yaml:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: cluster01
spec:
  templates:
    podTemplates:
      - name: clickhouse-pod-template
        spec:
          containers:
            - name: clickhouse
              image: altinity/clickhouse-server:24.8.14.10459.altinitystable
              volumeMounts:
                - name: clickhouse-storage
                  mountPath: /var/lib/clickhouse
    volumeClaimTemplates:
      - name: clickhouse-storage
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
          storageClassName: standard
  configuration:
    zookeeper:
      nodes:
        - host: keeper-clickhouse-keeper.keeper.svc.cluster.local
          port: 2181
    clusters:
      - name: cluster01
        layout:
          shardsCount: 1
          replicasCount: 2
        templates:
          podTemplate: clickhouse-pod-template

Notice that we’re increasing the number of replicas from the manifest02.yaml file on the Adding persistent storage to your cluster page. The hostname for ClickHouse Keeper is the name of the service (keeper-clickhouse-keeper), its namespace (keeper), followed by .svc.cluster.local.

Be sure the storageClass is set correctly for your cloud provider, then apply the new configuration file:

kubectl apply -f manifest03.yaml -n quick

You’ll get immediate feedback:

clickhouseinstallation.clickhouse.altinity.com/cluster01 configured

Check the status of your updated chi:

kubectl get chi -o wide -n quick

Wait until the status of the chi is Completed. Here we can see that we have one cluster, one shard, and two hosts.

NAME        VERSION   CLUSTERS   SHARDS   HOSTS   TASKID                                 STATUS      HOSTS-COMPLETED   HOSTS-UPDATED   HOSTS-ADDED   HOSTS-DELETED   ENDPOINT                                       AGE   SUSPEND
cluster01   0.25.0    1          1        2       17eff1f9-853b-4253-819c-b335e50fb76b   Completed                                                                   clickhouse-cluster01.quick.svc.cluster.local   31m

(Adding the -o wide option shows more data.) Taking a look at our new layout, we now have three services in our namespace:

kubectl get service -n quick
NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE
chi-cluster01-cluster01-0-0   ClusterIP   None         <none>        9000/TCP,8123/TCP,9009/TCP   30m
chi-cluster01-cluster01-0-1   ClusterIP   None         <none>        9000/TCP,8123/TCP,9009/TCP   2m45s
clickhouse-cluster01          ClusterIP   None         <none>        8123/TCP,9000/TCP            30m

If we log into any of the two hosts in our cluster and look at the system.clusters table, we can show the updated results and that we have a total of two hosts for cluster01 - one each of the two replicas.

kubectl exec -it chi-cluster01-cluster01-0-1-0 -n quick -- clickhouse-client

We’ll select a few parameters that we care about:

SELECT
    cluster,
    shard_num,
    replica_num,
    host_name,
    host_address
FROM system.clusters
WHERE cluster = 'cluster01'
   ┌─cluster───┬─shard_num─┬─replica_num─┬─host_name───────────────────┬─host_address─┐
1. │ cluster01 │         1 │           1 │ chi-cluster01-cluster01-0-0 │ 10.244.2.4   │
2. │ cluster01 │         1 │           2 │ chi-cluster01-cluster01-0-1 │ 127.0.0.1    │
   └───────────┴───────────┴─────────────┴─────────────────────────────┴──────────────┘

2 rows in set. Elapsed: 0.001 sec.

👉 Type exit to end the clickhouse-client session.

Now that we’ve got ClickHouse Keeper in place, we’re ready to create a table that uses the ReplicatedMergeTree engine, and that engine will keep all our replicas synchronized. So let’s move on….

Zookeeper is easy to install and enable. We’ll start by creating a namespace:

kubectl create namespace keeper

With the namespace created, we’ll use kubectl apply to create a one-node Zookeeper deployment:

kubectl apply -f https://raw.githubusercontent.com/Altinity/clickhouse-operator/release-0.25.0/deploy/zookeeper/zookeeper-manually/quick-start-persistent-volume/zookeeper-3-nodes.yaml -n keeper

This creates some new resources:

service/zookeeper created
service/zookeepers created
poddisruptionbudget.policy/zookeeper-pod-disruption-budget created
statefulset.apps/zookeeper created

Now make sure things are running. Run this command:

kubectl get all -n keeper

Wait until the status of all the pods is Running and you see READY 1/1 for the three pods and READY 3/3 for the statefulset:

NAME              READY   STATUS    RESTARTS   AGE
pod/zookeeper-0   1/1     Running   0          3m26s
pod/zookeeper-1   1/1     Running   0          2m46s
pod/zookeeper-2   1/1     Running   0          2m15s

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/zookeeper    ClusterIP   10.107.140.58   <none>        2181/TCP,7000/TCP   3m26s
service/zookeepers   ClusterIP   None            <none>        2888/TCP,3888/TCP   3m26s

NAME                         READY   AGE
statefulset.apps/zookeeper   3/3     3m26s

Now we’ve got a running Zookeeper, so we can tell our ClickHouse cluster how to connect to it…and add a second replica.

Adding a replica to our cluster

The spec for a ClickHouseInstallation includes a zookeeper parameter. Modify your manifest02.yaml file like this and save it as manifest03.yaml:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: cluster01
spec:
  templates:
    podTemplates:
      - name: clickhouse-pod-template
        spec:
          containers:
            - name: clickhouse
              image: altinity/clickhouse-server:24.8.14.10459.altinitystable
              volumeMounts:
                - name: clickhouse-storage
                  mountPath: /var/lib/clickhouse
    volumeClaimTemplates:
      - name: clickhouse-storage
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
          storageClassName: standard
  configuration:
    zookeeper:
      nodes:
        - host: zookeeper.keeper.svc.cluster.local
          port: 2181
    clusters:
      - name: cluster01
        layout:
          shardsCount: 1
          replicasCount: 2
        templates:
          podTemplate: clickhouse-pod-template

Notice that we’re increasing the number of replicas from the manifest02.yaml file on the Adding persistent storage to your cluster page. The hostname for Zookeeper is the name of the service (zookeeper), its namespace (keeper), followed by .svc.cluster.local.

Be sure you’re using the correct storageClassName for your Kubernetes provider. With that done, apply the new configuration file:

kubectl apply -f manifest03.yaml -n quick

You’ll get immediate feedback:

clickhouseinstallation.clickhouse.altinity.com/cluster01 configured

It will likely take a minute or two, but you should see your updated chi:

kubectl get chi -o wide -n quick

Wait until the status of the chi is Completed. Here we can see that we have one cluster, one shard, and two hosts.

NAME        VERSION   CLUSTERS   SHARDS   HOSTS   TASKID                                 STATUS      HOSTS-COMPLETED   HOSTS-UPDATED   HOSTS-ADDED   HOSTS-DELETED   ENDPOINT                                       AGE    SUSPEND
cluster01   0.25.0    1          1        2       eca1aedd-4703-460a-9fe6-f434a289e8cb   Completed                                                                   clickhouse-cluster01.quick.svc.cluster.local   107m

(Adding the -o wide option shows more data.) Taking a look at our new layout, we now have three services in our namespace:

kubectl get service -n quick
NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE
chi-cluster01-cluster01-0-0   ClusterIP   None         <none>        9000/TCP,8123/TCP,9009/TCP   106m
chi-cluster01-cluster01-0-1   ClusterIP   None         <none>        9000/TCP,8123/TCP,9009/TCP   4m28s
clickhouse-cluster01          ClusterIP   None         <none>        8123/TCP,9000/TCP            106m

If we log into any of the hosts in our cluster and look at the system.clusters table, we can show the updated results and that we have a total of two hosts for cluster01 - one for each of the two replicas.

kubectl exec -it chi-cluster01-cluster01-0-1-0 -n quick -- clickhouse-client

We’ll select a few parameters that we care about:

SELECT
    cluster,
    shard_num,
    replica_num,
    host_name,
    host_address
FROM system.clusters
WHERE cluster = 'cluster01'
   ┌─cluster───┬─shard_num─┬─replica_num─┬─host_name───────────────────┬─host_address─┐
1. │ cluster01 │         1 │           1 │ chi-cluster01-cluster01-0-0 │ 10.244.2.4   │
2. │ cluster01 │         1 │           2 │ chi-cluster01-cluster01-0-1 │ 127.0.0.1    │
   └───────────┴───────────┴─────────────┴─────────────────────────────┴──────────────┘

2 rows in set. Elapsed: 0.003 sec.

👉 Type exit to end the clickhouse-client session.

Now that we’ve got Zookeeper in place, we’re ready to work with our replicas. We’ll set up tables with the ReplicatedMergeTree engine, and that engine will keep all our replicas synchronized. So let’s move on….

👉 Next: Working with replication