Creating your first ClickHouse® cluster

How to create a cluster and make sure it’s running

At this point, you’ve got the Altinity Kubernetes Operator for ClickHouse® installed. Now let’s give it something to work with. We’ll start with a simple ClickHouse cluster here: no persistent storage, one replica, and one shard. (We’ll cover those topics over the next couple of steps.)

Creating your first cluster

Now that we have our namespace, we’ll create a simple cluster: one shard, one replica. Copy the following text and save it as manifest01.yaml:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: cluster01
spec:
  templates:
    podTemplates:
      - name: clickhouse-pod-template
        spec:
          containers:
            - name: clickhouse
              image: altinity/clickhouse-server:24.8.14.10501.altinitystable
  configuration:
    clusters:
      - name: cluster01
        layout:
          shardsCount: 1
          replicasCount: 1
        templates:
          podTemplate: clickhouse-pod-template

When you installed the operator, it defined a custom resource type called a ClickHouseInstallation; that’s what we’re creating here. A ClickHouseInstallation contains a ClickHouse server and lots of other useful things. Here we’re creating a ClickHouseInstallation named cluster01, and that cluster has one shard and one replica.

NOTE: The YAML above would be simpler if we didn’t specify a particular version of the altinity/clickhouse-server container image. By the time we go through all the exercises in this tutorial, however, things will be simpler because we were specific here. (Hopefully you’re just cutting and pasting anyway.)

Use kubectl apply to create your ClickHouseInstallation:

kubectl apply -f manifest01.yaml -n operator

You’ll see this:

clickhouseinstallation.clickhouse.altinity.com/cluster01 created

Verify that your new cluster is running:

kubectl get clickhouseinstallation -n operator

The status of your cluster will be In Progress for a minute or two. (BTW, the operator defines chi as an abbreviation for clickhouseinstallation. We’ll use chi from now on.) When everything is ready, its status will be Completed:

NAME        CLUSTERS   HOSTS   STATUS      HOSTS-COMPLETED   AGE     SUSPEND  
cluster01   1          1       Completed                     2m40s

PRO TIP: You can use the awesome kubens tool to set the default namespace for all kubectl commands. Type kubens operator, and kubectl will use the operator namespace until you tell it otherwise. See the kubens / kubectx repo to get started. You’re welcome.

Now that we’ve got a ClickHouse cluster up and running, we’ll connect to it and run some basic commands.

Connecting to your cluster with kubectl exec

Let’s talk to our cluster and run a simple ClickHouse query. We can hop in directly through Kubernetes and run the clickhouse-client that’s part of the image. First, we have to get the name of the pod:

kubectl get pods -n operator | grep cluster01

You’ll see this:

chi-cluster01-cluster01-0-0-0                                     1/1     Running   0          75s

So chi-cluster01-cluster01-0-0-0 is the name of the pod running ClickHouse. We’ll connect to it with kubectl exec and run clickhouse-client on it:

kubectl exec -it chi-cluster01-cluster01-0-0-0 -n operator -- clickhouse-client

The ClickHouse server will welcome you:

ClickHouse client version 24.8.14.10501.altinitystable (altinity build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 24.8.14.

chi-cluster01-cluster01-0-0-0.chi-cluster01-cluster01-0-0.quick.svc.cluster.local :)

Now that we’re in ClickHouse, let’s run a query and look at some system data:

SELECT
    cluster,
    host_name,
    port
FROM system.clusters

You’ll see something very similar to this:

   ┌─cluster────────┬─host_name───────────────────┬─port─┐
1.  all-clusters    chi-cluster01-cluster01-0-0  9000 
2.  all-replicated  chi-cluster01-cluster01-0-0  9000 
3.  all-sharded     chi-cluster01-cluster01-0-0  9000 
4.  cluster01       chi-cluster01-cluster01-0-0  9000 
5.  default         localhost                    9000 
   └────────────────┴─────────────────────────────┴──────┘

5 rows in set. Elapsed: 0.002 sec.

👉 Type exit to end the clickhouse-client session.

Let’s get some data in here!

Not so fast! At this point, you’d expect a tutorial to show you how to create a database and put some data into it. However, we haven’t defined any persistent storage for our cluster. If a pod fails, any data it had will be gone when the pod restarts. So we’ll add persistent storage to our cluster next.

👉 Next: Adding persistent storage

Optional: Connecting to your cluster directly

Most of the time when you’re working with ClickHouse, you connect directly to the cluster with clickhouse-connect. To do that, you’ll need to set up network access for your cluster. The easiest way to do that is with kubectl port-forward. First, look at the services in your namespace:

kubectl get svc -n operator 

You’ll see the ports used by each service:

NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
chi-cluster01-cluster01-0-0   ClusterIP   None             <none>        9000/TCP,8123/TCP,9009/TCP   3m53s
clickhouse-cluster01          ClusterIP   None             <none>        8123/TCP,9000/TCP            3m42s
clickhouse-operator-metrics   ClusterIP   10.102.147.251   <none>        8888/TCP,9999/TCP            22m

So to connect directly to the first pod, use this command:

kubectl port-forward chi-cluster01-cluster01-0-0-0 9000:9000 8123:8123 9009:9009 -n operator &

Be sure to put the & at the end to keep this running in the background.

You’ll see the PID for the process and the ports you can access directly:

[1] 72202

Forwarding from 127.0.0.1:9000 -> 9000    
Forwarding from [::1]:9000 -> 9000  
Forwarding from 127.0.0.1:8123 -> 8123  
Forwarding from [::1]:8123 -> 8123  
Forwarding from 127.0.0.1:9009 -> 9009  
Forwarding from [::1]:9009 -> 9009

Without a hostname or port, clickhouse-client uses localhost:9000. That makes it easy to connect to our ClickHouse cluster. Just type clickhouse-client at the command line:

> clickhouse-client
ClickHouse client version 23.5.3.1.
Connecting to localhost:9000 as user default.
Handling connection for 9000
Connected to ClickHouse server version 24.8.14 revision 54472.

ClickHouse client version is older than ClickHouse server. It may lack support for new features.

Handling connection for 9000
chi-cluster01-cluster01-0-0-0.chi-cluster01-cluster01-0-0.quick.svc.cluster.local :)

Now you can run SQL statements to your heart’s content. When you exit out of clickhouse-client, be sure to stop port forwarding (kill 72202, in this example).

Depending on your Kubernetes setup and provider, you may be able to use a LoadBalancer to access the cluster directly, but this method, clumsy as it is, will always work. See the documentation for your Kubernetes provider for details.

Also be aware that when we’re using the clickhouse-client that’s included in the clickhouse-server container, the version of the client and the server are in sync. If you install clickhouse-client directly on your machine, there’s no guarantee that the client and server will be the same version. (See the warning message above.) That’s unlikely to cause problems, but it’s something to be aware of if the system starts behaving strangely.

Now let’s add persistent storage to our cluster….

👉 Next: Adding persistent storage