Persistent Storage

How to set up persistent storage for your ClickHouse Kubernetes cluster.
kubectl create namespace test
namespace/test created

We’ve shown how to create ClickHouse clusters in Kubernetes, how to add zookeeper so we can create replicas of clusters. Now we’re going to show how to set persistent storage so you can change your cluster configurations without losing your hard work.

The examples here are built from the Altinity Kubernetes Operator examples, simplified down for our demonstrations.

Create a new file called sample05.yaml with the following:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "demo-01"
spec:
  configuration:
    zookeeper:
        nodes:
        - host: zookeeper.zoo1ns
          port: 2181
    clusters:
      - name: "demo-01"
        layout:
          shardsCount: 2
          replicasCount: 2
        templates:
          podTemplate: clickhouse-stable
          volumeClaimTemplate: storage-vc-template
  templates:
    podTemplates:
      - name: clickhouse-stable
        spec:
          containers:
          - name: clickhouse
            image: altinity/clickhouse-server:21.8.10.1.altinitystable
    volumeClaimTemplates:
      - name: storage-vc-template
        spec:
          storageClassName: standard
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi

Those who have followed the previous examples will recognize the clusters being created, but there are some new additions:

  • volumeClaimTemplate: This is setting up storage, and we’re specifying the class as default. For full details on the different storage classes see the kubectl Storage Class documentation
  • storage: We’re going to give our cluster 1 Gigabyte of storage, enough for our sample systems. If you need more space that can be upgraded by changing these settings.
  • podTemplate: Here we’ll specify what our pod types are going to be. We’ll use the latest version of the ClickHouse containers, but other versions can be specified to best it your needs. For more information, see the [ClickHouse on Kubernetes Operator Guide]({<ref “kubernetesoperatorguide”>}).

Save your new configuration file and install it. If you’ve been following this guide and already have the namespace test operating, this will update it:

kubectl apply -f sample05.yaml -n test
clickhouseinstallation.clickhouse.altinity.com/demo-01 created

Verify it completes with get all for this namespace, and you should have similar results:

kubectl -n test get chi -o wide
NAME      VERSION   CLUSTERS   SHARDS   HOSTS   TASKID                                 STATUS      UPDATED   ADDED   DELETED   DELETE   ENDPOINT                                    AGE
demo-01   0.18.3    1          2        4       57ec3f87-9950-4e5e-9b26-13680f66331d   Completed             4                          clickhouse-demo-01.test.svc.cluster.local   108s
kubectl get service -n test
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                         AGE
chi-demo-01-demo-01-0-0   ClusterIP      None             <none>        8123/TCP,9000/TCP,9009/TCP      81s
chi-demo-01-demo-01-0-1   ClusterIP      None             <none>        8123/TCP,9000/TCP,9009/TCP      63s
chi-demo-01-demo-01-1-0   ClusterIP      None             <none>        8123/TCP,9000/TCP,9009/TCP      45s
chi-demo-01-demo-01-1-1   ClusterIP      None             <none>        8123/TCP,9000/TCP,9009/TCP      8s
clickhouse-demo-01        LoadBalancer   10.104.236.138   <pending>     8123:31281/TCP,9000:30052/TCP   98s

Testing Persistent Storage

Everything is running, let’s verify that our storage is working. We’re going to exec into our cluster with a bash prompt on one of the pods created:

kubectl -n test exec -it chi-demo-01-demo-01-0-0-0 -- df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          32G   26G  4.0G  87% /
tmpfs            64M     0   64M   0% /dev
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda2        32G   26G  4.0G  87% /etc/hosts
shm              64M     0   64M   0% /dev/shm
tmpfs           7.7G   12K  7.7G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs           3.9G     0  3.9G   0% /proc/acpi
tmpfs           3.9G     0  3.9G   0% /proc/scsi
tmpfs           3.9G     0  3.9G   0% /sys/firmware

And we can see we have about 1 Gigabyte of storage allocated into our cluster.

Let’s add some data to it. Nothing major, just to show that we can store information, then change the configuration and the data stays.

Exit out of your cluster and launch clickhouse-client on your LoadBalancer. We’re going to create a database, then create a table in the database, then show both.

SHOW DATABASES
┌─name────┐
 default 
 system  
└─────────┘
CREATE DATABASE teststorage
CREATE TABLE teststorage.test AS system.one ENGINE = Distributed('demo-01', 'system', 'one')
SHOW DATABASES
┌─name────────┐
 default     
 system      
 teststorage 
└─────────────┘
SELECT * FROM teststorage.test
┌─dummy─┐
     0 
└───────┘
┌─dummy─┐
     0 
└───────┘

If you followed the instructions from [Zookeeper and Replicas]({<ref “quickzookeeper” >}), note at the end when we updated the configuration of our sample cluster that all of the tables and data we made were deleted. Let’s recreate that experiment now with a new configuration.

Create a new file called sample06.yaml. We’re going to reduce the shards and replicas to 1:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "demo-01"
spec:
  configuration:
    zookeeper:
        nodes:
        - host: zookeeper.zoo1ns
          port: 2181
    clusters:
      - name: "demo-01"
        layout:
          shardsCount: 1
          replicasCount: 1
        templates:
          podTemplate: clickhouse-stable
          volumeClaimTemplate: storage-vc-template
  templates:
    podTemplates:
      - name: clickhouse-stable
        spec:
          containers:
          - name: clickhouse
            image: altinity/clickhouse-server:21.8.10.1.altinitystable
    volumeClaimTemplates:
      - name: storage-vc-template
        spec:
          storageClassName: standard
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi

Update the cluster with the following:

kubectl apply -f sample06.yaml -n test
clickhouseinstallation.clickhouse.altinity.com/demo-01 configured

Wait until the configuration is done and all of the pods are spun down, then launch a bash prompt on one of the pods and check the storage available:

kubectl -n test get chi -o wide
NAME      VERSION   CLUSTERS   SHARDS   HOSTS   TASKID                                 STATUS      UPDATED   ADDED   DELETED   DELETE   ENDPOINT                                    AGE
demo-01   0.18.3    1          1        1       776c1a82-44e1-4c2e-97a7-34cef629e698   Completed                               4        clickhouse-demo-01.test.svc.cluster.local   2m56s
kubectl -n test exec -it chi-demo-01-demo-01-0-0-0 -- df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          32G   26G  4.0G  87% /
tmpfs            64M     0   64M   0% /dev
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda2        32G   26G  4.0G  87% /etc/hosts
shm              64M     0   64M   0% /dev/shm
tmpfs           7.7G   12K  7.7G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs           3.9G     0  3.9G   0% /proc/acpi
tmpfs           3.9G     0  3.9G   0% /proc/scsi
tmpfs           3.9G     0  3.9G   0% /sys/firmware

Storage is still there. We can test if our databases are still available by logging into clickhouse:

SHOW DATABASES
┌─name────────┐
 default     
 system      
 teststorage 
└─────────────┘
SELECT * FROM teststorage.test
┌─dummy─┐
     0 
└───────┘

All of our databases and tables are there.

There are different ways of allocating storage - for data, for logging, multiple data volumes for your cluster nodes, but this will get you started in running your own Kubernetes cluster running ClickHouse in your favorite environment.


Last modified 2022.03.11: Updated with qa approved `20220311_steinsgate`.