Bonus: Adding a Shard to your ClickHouse® Cluster

Enabling horizontal scaling once you’ve got 5 TB of data or so

As a final topic, we’ll talk a bit about shards. But before we do, here’s something to keep in mind:

Don’t use shards unless you have at least 5 TB of data. If that doesn’t apply to you, feel free to ignore this section and get on with your day.

Shards give you horizontal scaling in ClickHouse®. When your cluster has multiple shards, data is distributed across them, which means queries against your data can be done in parallel on each shard. But ClickHouse is so powerful, the overhead of shards won’t give you any benefits until you’ve got a substantial amount of data.

So if you’re still here, we assume you want to add a shard to your cluster. Adding a second shard is easy. Make a copy of manifest03.yaml and name it manifest04.yaml. Now edit the file and change shardsCount to 2:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: cluster01
spec:
  templates:
    podTemplates:
. . . 
    volumeClaimTemplates:
. . . 
  configuration:
. . . 
    clusters:
      - name: cluster01
        layout:
          shardsCount: 2
          replicasCount: 2
. . .

(We’re starting with manifest03.yaml because it’s configured to use either ClickHouse Keeper or Zookeeper, and it specifies the right storageClass for your Kubernetes environment.)

Now apply the file to update cluster01:

kubectl apply -f manifest04.yaml -n operator

You’ll see a success message:

clickhouseinstallation.clickhouse.altinity.com/cluster01 configured

Verify that the cluster is running - this may take a few minutes, depending on your system:

kubectl get chi -o wide -n operator

The status may be InProgress for a short while, but at some point the cluster will be up and running again and the status will be Completed:

NAME        VERSION   CLUSTERS   SHARDS   HOSTS   TASKID                                 STATUS      HOSTS-COMPLETED   HOSTS-UPDATED   HOSTS-ADDED   HOSTS-DELETED   ENDPOINT                                          AGE   SUSPEND
cluster01   0.25.1    1          2        4       3f437d54-55bb-4fa6-b8cb-4a3584303ce4   Completed                                                                   clickhouse-cluster01.operator.svc.cluster.local   43m

Notice that the number of hosts is now 4. To see our two shards, list the services that start with chi in the operator namespace:

kubectl get service -n operator | grep chi

chi-cluster01-cluster01-0-0       ClusterIP   None             <none>        9000/TCP,8123/TCP,9009/TCP   44m
chi-cluster01-cluster01-0-1       ClusterIP   None             <none>        9000/TCP,8123/TCP,9009/TCP   12m
chi-cluster01-cluster01-1-0       ClusterIP   None             <none>        9000/TCP,8123/TCP,9009/TCP   2m5s
chi-cluster01-cluster01-1-1       ClusterIP   None             <none>        9000/TCP,8123/TCP,9009/TCP   94s

We should have two new pods as well.

kubectl get pods -n operator | grep chi

And there they are, chi-cluster01-cluster01-1-0-0 and chi-cluster01-cluster01-1-1-0:

chi-cluster01-cluster01-0-0-0          1/1     Running   1 (14m ago)   33m
chi-cluster01-cluster01-0-1-0          1/1     Running   0             13m
chi-cluster01-cluster01-1-0-0          1/1     Running   0             3m41s
chi-cluster01-cluster01-1-1-0          1/1     Running   0             3m10s

Let’s connect to one of the new pods:

kubectl exec -it chi-cluster01-cluster01-1-0-0 -n operator -- clickhouse-client

Run the following command to see what we have on the second shard:

SHOW databases;

You’ll see something like this:

   ┌─name───────────────┐
1. │ INFORMATION_SCHEMA │
2. │ analytics          │
3. │ default            │
4. │ information_schema │
5. │ system             │
   └────────────────────┘

5 rows in set. Elapsed: 0.002 sec.

Our database is in there, as we’d expect. And the two tables we created earlier should be here:

USE analytics; SHOW TABLES;

There they are:

   ┌─name───────────────────┐
1. │ page_views_distributed │
2. │ page_views_replicated  │
   └────────────────────────┘

2 rows in set. Elapsed: 0.002 sec.

Seems legit…let’s take a look at our data:

SELECT *
FROM page_views_replicated;

We’ve got nothing:

Ok.

0 rows in set. Elapsed: 0.001 sec.

Wait, our data isn’t here? Well, it isn’t supposed to be. Remember, we’re distributing our data across multiple shards, so no data should be in more than one shard. Our data on the first shard is still alive and well, but we’ve got nothing here. The important thing is that our new shard has the same databases and tables as our original shard.

We’ll talk about how to work across all our shards in a second, but first, let’s add some new data into the table on this shard:

INSERT INTO page_views_replicated (event_time, user_id, page_url, referrer_url, device, country) VALUES
('2025-01-02 15:30:00', 105, '/home', 'direct', 'desktop', 'Germany'),
('2025-01-03 11:45:00', 107, '/contact', 'direct', 'desktop', 'India'),
('2025-01-04 18:00:00', 102, '/products', 'google.com', 'tablet', 'Canada'),
('2025-01-05 21:30:00', 109, '/checkout', 'facebook.com', 'mobile', 'USA'),
('2025-01-09 17:20:00', 110, '/cart', 'direct', 'desktop', 'Australia');

And we can run SELECT to see our new (and different) data:

SELECT *
FROM page_views_replicated

Our new data is there:

   ┌──────────event_time─┬─user_id─┬─page_url──┬─referrer_url─┬─device──┬─country───┐
1. │ 2025-01-02 15:30:00 │     105 │ /home     │ direct       │ desktop │ Germany   │
2. │ 2025-01-03 11:45:00 │     107 │ /contact  │ direct       │ desktop │ India     │
3. │ 2025-01-04 18:00:00 │     102 │ /products │ google.com   │ tablet  │ Canada    │
4. │ 2025-01-05 21:30:00 │     109 │ /checkout │ facebook.com │ mobile  │ USA       │
5. │ 2025-01-09 17:20:00 │     110 │ /cart     │ direct       │ desktop │ Australia │
   └─────────────────────┴─────────┴───────────┴──────────────┴─────────┴───────────┘

5 rows in set. Elapsed: 0.007 sec.

So now the data in page_views_replicated is spread across two shards. That’s what we want for horizontal scaling, but we’d like our queries (and other SQL statements, for that matter) to run across all shards. That’s where the Distributed table we created earlier comes into play. Here’s the same query against page_views_distributed:

SELECT *
FROM page_views_distributed;

Now we get all the data from all the shards:

   ┌──────────event_time─┬─user_id─┬─page_url──┬─referrer_url─┬─device──┬─country───┐
1. │ 2025-01-02 15:30:00 │     105 │ /home     │ direct       │ desktop │ Germany   │
2. │ 2025-01-03 11:45:00 │     107 │ /contact  │ direct       │ desktop │ India     │
3. │ 2025-01-04 18:00:00 │     102 │ /products │ google.com   │ tablet  │ Canada    │
4. │ 2025-01-05 21:30:00 │     109 │ /checkout │ facebook.com │ mobile  │ USA       │
5. │ 2025-01-09 17:20:00 │     110 │ /cart     │ direct       │ desktop │ Australia │
   └─────────────────────┴─────────┴───────────┴──────────────┴─────────┴───────────┘
    ┌──────────event_time─┬─user_id─┬─page_url──┬─referrer_url─┬─device──┬─country───┐
 6. │ 2025-01-01 12:00:00 │     101 │ /home     │ google.com   │ mobile  │ USA       │
 7. │ 2025-01-01 12:05:00 │     102 │ /products │ facebook.com │ desktop │ Canada    │
 8. │ 2025-01-01 12:10:00 │     103 │ /cart     │ twitter.com  │ tablet  │ UK        │
 9. │ 2025-01-02 14:00:00 │     101 │ /checkout │ google.com   │ mobile  │ USA       │
10. │ 2025-01-06 08:20:00 │     110 │ /blog     │ twitter.com  │ desktop │ Australia │
    └─────────────────────┴─────────┴───────────┴──────────────┴─────────┴───────────┘
    ┌──────────event_time─┬─user_id─┬─page_url─┬─referrer_url─┬─device─┬─country─┐
11. │ 2025-01-01 12:00:00 │     106 │ /home    │ google.com   │ mobile │ Nigeria │
    └─────────────────────┴─────────┴──────────┴──────────────┴────────┴─────────┘

11 rows in set. Elapsed: 0.012 sec.

👉 Type exit to end the clickhouse-client session.

To exhaust the topic, let’s query all shards and replicas. We should see the same data on all replicas of each shard. Try this command:

for pod in 0-0-0 0-1-0 1-0-0 1-1-0; do
  echo "Running query on chi-cluster01-cluster01-$pod..."
  kubectl exec -it chi-cluster01-cluster01-$pod -n operator \
  -- clickhouse-client -q "SELECT * from analytics.page_views_replicated;"
done

Sure enough, the same data is on each shard / replica pair:

Running query on chi-cluster01-cluster01-0-0-0...
2025-01-01 12:00:00	106	/home	google.com	mobile	Nigeria
2025-01-01 12:00:00	101	/home	google.com	mobile	USA
2025-01-01 12:05:00	102	/products	facebook.com	desktop	Canada
2025-01-01 12:10:00	103	/cart	twitter.com	tablet	UK
2025-01-02 14:00:00	101	/checkout	google.com	mobile	USA
2025-01-06 08:20:00	110	/blog	twitter.com	desktop	Australia
Running query on chi-cluster01-cluster01-0-1-0...
2025-01-01 12:00:00	106	/home	google.com	mobile	Nigeria
2025-01-01 12:00:00	101	/home	google.com	mobile	USA
2025-01-01 12:05:00	102	/products	facebook.com	desktop	Canada
2025-01-01 12:10:00	103	/cart	twitter.com	tablet	UK
2025-01-02 14:00:00	101	/checkout	google.com	mobile	USA
2025-01-06 08:20:00	110	/blog	twitter.com	desktop	Australia
Running query on chi-cluster01-cluster01-1-0-0...
2025-01-02 15:30:00	105	/home	direct	desktop	Germany
2025-01-03 11:45:00	107	/contact	direct	desktop	India
2025-01-04 18:00:00	102	/products	google.com	tablet	Canada
2025-01-05 21:30:00	109	/checkout	facebook.com	mobile	USA
2025-01-09 17:20:00	110	/cart	direct	desktop	Australia
Running query on chi-cluster01-cluster01-1-1-0...
2025-01-02 15:30:00	105	/home	direct	desktop	Germany
2025-01-03 11:45:00	107	/contact	direct	desktop	India
2025-01-04 18:00:00	102	/products	google.com	tablet	Canada
2025-01-05 21:30:00	109	/checkout	facebook.com	mobile	USA
2025-01-09 17:20:00	110	/cart	direct	desktop	Australia

So that’s how you work with shards and replication. When new data is added, it goes to a randomly selected shard, then the new data is copied to the shard’s replica. (We connected to a particular host to make a point, but you normally don’t specify a shard.)

Again, you shouldn’t use them until you’ve got a substantial amount of data, but now you know how to add a shard to your ClickHouse cluster, and how shards change the way you work with your data.