ClickHouse® Cluster with Zookeeper Production Configuration Guide
Moving from a single ClickHouse® server to a clustered format provides several benefits:
- Replication guarantees data integrity.
- Provides redundancy.
- Failover by being able to restart half of the nodes without encountering downtime.
Moving from an unsharded ClickHouse environment to a sharded cluster requires redesign of schema and queries. Starting with sharding from the beginning makes it easier in the future to scale the cluster up.
Setting up a ClickHouse cluster for a production environment requires the following stages:
- Hardware Requirements
- Network Configuration
- Create Host Names
- Monitoring Considerations
- Configuration Steps
- Setting Up Backups
- Staging Plans
- Upgrading The Cluster
Hardware Requirements
ClickHouse
ClickHouse will take everything from your hardware. So the more hardware - the better. As of this publication, the hardware requirements are:
- Minimum Hardware: 4-core CPU with support of SSE4.2, 16 Gb RAM, 1Tb HDD.
- Recommended for development and staging environments.
- SSE4.2 is required, and going below 4 Gb of RAM is not recommended.
- Recommended Hardware: =16-cores, >=64Gb RAM, HDD-raid or SSD.
- For processing up to hundreds of millions of rows.
Zookeeper
Zookeeper requires separate servers from those used for ClickHouse. Zookeeper has poor performance when installed on the same node as ClickHouse.
Hardware Requirements for Zookeeper:
- Fast disk speed (ideally NVMe, 128Gb should be enough).
- Any modern CPU.
- 8Gb of RAM
The number of Zookeeper instances depends on the environment:
- Production: 3 is an optimal number of zookeeper instances.
- Development and Staging: 1 zookeeper instance is sufficient.
As a reference, see Running Zookeeper in Production.
Hardware Configuration
Configure the servers according to the ClickHouse Usage Recommendations.
Test Your Hardware
Be sure to test the following:
- RAM speed.
- Network speed.
- Storage speed.
It’s better to find any performance issues before installing ClickHouse.
Network Configuration
Networking And Server Room Planning
It is recommended to use a fast network, ideally 10 Gbit. ClickHouse nodes generate a lot of traffic along with the Zookeeper connections and inter-Zookeeper communications.
Low latency is more important than bandwidth.
Keep the replicas isolated on hardware level. This allows for cluster failover from possible outages.
- For Physical Environments: Avoid placing two ClickHouse replicas to the same server rack. Ideally they should be on isolated network switches and an isolated power supply.
- For Cloud Environments: Use different availability zones between the ClickHouse replicas.
These settings are the same as the Zookeeper nodes.
For example:
Rack | Server | Server | Server | Server |
---|---|---|---|---|
Rack 1 | CH_SHARD1_R1 | CH_SHARD2_R1 | CH_SHARD3_R1 | ZOO_1 |
Rack 2 | CH_SHARD1_R2 | CH_SHARD2_R2 | CH_SHARD3_R2 | ZOO_2 |
Rack 3 | ZOO3 |
Network Ports And Firewall
ClickHouse listens the following ports:
- 9000:
clickhouse-client
, native clients, other clickhouse-servers connect to here. - 8123: HTTP clients
- 9009: Other replicas will connect here to download data.
For more information, see the blog post ClickHouse Networking, Part 1.
Zookeeper listens on the following ports:
- 2181: Client connections.
- 2888: Inter-ensemble connections.
- 3888: Leader election.
Outbound traffic from ClickHouse connects to the following ports:
- ZooKeeper: On port 2181.
- Other CH nodes in the cluster: On port 9000 and 9009.
- Dictionary sources: Depending on what was configured such as HTTP, MySQL, Mongo, etc.
- Kafka or Hadoop: If those integrations were enabled.
SSL
For non-trusted networks enable SSL/HTTPS. If possible, it is better to keep interserver communications unencrypted for performance reasons.
Create Host Names
The best time to start creating a naming schema for the servers is before they’re created and configured.
There are a few features based on good server naming in ClickHouse:
- clickhouse-client prompts: Allows a different prompt for
clickhouse-client
per server hostname. - Nearest hostname load balancing: For more information, see Nearest Hostname.
A good option is to use the following:
{datacenter}-{serverroom}-{rack identifier}-{clickhouse cluster identifier}-{shard number or server number}.
Other examples:
-
rxv-olap-ch-master-sh01-r01
:- rxv - location (rack#15)
- olap - product name
- ch = clickhouse
- master = stage
- sh01 = shard 1
- r01 = replica 1
-
hetnzerde1-ch-prod-01.local
:- hetnzerde1 - location (also replica id)
- ch = clickhouse
- prod = stage
- 01 - server number / shard number in that DC
-
sh01.ch-front.dev.aws-east1a.example.com
:- sh01 - shard 01
- ch-front - cluster name
- dev = stage
- aws = cloud provider
- east1a = region and availability zone
Host Name References
- What are the best practices for domain names (dev, staging, production)?
- 9 Best Practices and Examples for Working with Kubernetes Labels
- Thoughts On Hostname Nomenclature
Additional Host Name Tips
- Host names configured on the server should not change. If you do need to change the host name, one reference to use is How to Change Hostname on Ubuntu 18.04.
- The server should be accessible to other servers in the cluster via its hostname. Otherwise you will need to configure interserver_hostname in your config.
Monitoring Considerations
External Monitoring
For external monitoring:
- Graphite: Use the embedded exporter. See
config.xml
. - InfluxDB: Use the embedded exporter, plus Telegraf. For more information, see Graphite protocol support in InfluxDB.
- For Prometheus you can use the clickhouse-exporter
To enable external monitoring, enable system.metric_log
in config.xml
.
For dashboard system:
- Grafana is recommended for graphs, reports, alerts, dashboard, etc.
- Other options are Nagios or Zabbix.
The following metrics should be collected:
- For Host Machine:
- CPU
- Memory
- Network (bytes/packets)
- Storage (iops)
- Disk Space (free / used)
- For ClickHouse:
- Connections (count)
- RWLocks
- Read / Write / Return (bytes)
- Read / Write / Return (rows)
- Zookeeper operations (count)
- Absolute delay
- Query duration (optional)
- Replication parts and queue (count)
- For Zookeeper:
The following queries are recommended:
SELECT * FROM system.replicas
- For more information, see the ClickHouse guide on System Tables.
SELECT * FROM system.merges
- Checks on the speed and progress of currently executed merges.
SELECT * FROM system.mutations
- This is the source of information on the speed and progress of currently executed merges.
Monitor and Alerts
Configure the notifications for events and thresholds based on the data in the file clickhouse_kubernetes_monitoring_and_alerts.csv.
Health Checks
The following health checks should be monitored:
Check Name | Shell or SQL command | Severity |
---|---|---|
ClickHouse status | $ curl 'http://localhost:8123/'Ok. | Critical |
Too many simultaneous queries. Maximum: 100 | select value from system.metrics where metric='Query' | Critical |
Replication status | $ curl 'http://localhost:8123/replicas_status'Ok. | High |
Read only replicas (reflected by replicas_status as well) | select value from system.metrics where metric='ReadonlyReplica’ | High |
ReplicaPartialShutdown (not reflected by replicas_status, but seems to correlate with ZooKeeperHardwareExceptions) | select value from system.events where event='ReplicaPartialShutdown' | HighI turned this one off. It almost always correlates with ZooKeeperHardwareExceptions, and when it’s not, then there is nothing bad happening… |
Some replication tasks are stuck | select count()from system.replication_queuewhere num_tries > 100 | High |
ZooKeeper is available | select count() from system.zookeeper where path='/' | Critical for writes |
ZooKeeper exceptions | select value from system.events where event='ZooKeeperHardwareExceptions' | Medium |
Other CH nodes are available | $ for node in `echo "select distinct host_address from system.clusters where host_name !='localhost'" | curl 'http://localhost:8123/' –silent –data-binary @-`; do curl "http://$node:8123/" –silent ; done |
All CH clusters are available (i.e. every configured cluster has enough replicas to serve queries) | for cluster in `echo "select distinct cluster from system.clusters where host_name !='localhost'" | curl 'http://localhost:8123/' –silent –data-binary @-` ; do clickhouse-client –query="select '$cluster', 'OK' from cluster('$cluster', system, one)" ; done |
There are files in 'detached' folders | $ find /var/lib/clickhouse/data///detached/* -type d | wc -l; 19.8+select count() from system.detached_parts |
Too many parts: Number of parts is growing; Inserts are being delayed; Inserts are being rejected |
select value from system.asynchronous_metrics where metric='MaxPartCountForPartition';select value from system.events/system.metrics where event/metric='DelayedInserts'; select value from system.events where event='RejectedInserts' |
Critical |
Dictionaries: exception | select concat(name,': ',last_exception) from system.dictionarieswhere last_exception != '' | Medium |
ClickHouse has been restarted | select uptime();select value from system.asynchronous_metrics where metric='Uptime' | |
DistributedFilesToInsert should not be always increasing | select value from system.metrics where metric='DistributedFilesToInsert' | Medium |
A data part was lost | select value from system.events where event='ReplicatedDataLoss' | High |
Data parts are not the same on different replicas | select value from system.events where event='DataAfterMergeDiffersFromReplica'; select value from system.events where event='DataAfterMutationDiffersFromReplica' |
Medium |
External Monitoring References
Configuration steps
Use ansible/puppet/salt or other system to control servers configuration.
-
Configure ClickHouse access to zookeeper by putting file
zookeeper.xml
in/etc/clickhouse-server/config.d/
folder. This file must be placed on all ClickHouse servers.<yandex> <zookeeper> <node> <host>zookeeper1</host> <port>2181</port> </node> <node> <host>zookeeper2</host> <port>2181</port> </node> <node> <host>zookeeper3</host> <port>2181</port> </node> </zookeeper> </yandex>
-
On each server put the file
macros.xml
in/etc/clickhouse-server/config.d/
folder.<yandex> <!-- That macros are defined per server, and they can be used in DDL, to make the DB schema cluster/server neutral --> <macros> <cluster>prod_cluster</cluster> <shard>01</shard> <replica>clickhouse-sh1r1</replica> <!-- better - use the same as hostname --> </macros> </yandex>
-
On each server place the file
cluster.xml
in/etc/clickhouse-server/config.d/
folder.</yandex> <remote_servers> <prod_cluster> <!-- you need to give a some name for a cluster --> <shard> <internal_replication>true</internal_replication> <replica> <host>clickhouse-sh1r1</host> <port>9000</port> </replica> <replica> <host>clickhouse-sh1r2</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>clickhouse-sh2r1</host> <port>9000</port> </replica> <replica> <host>clickhouse-sh2r2</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>clickhouse-sh3r1</host> <port>9000</port> </replica> <replica> <host>clickhouse-sh3r2</host> <port>9000</port> </replica> </shard> </prod_cluster> </remote_servers> <yandex>
-
Create 2 extra cluster configurations with the following modified SQL query:
- cluster
- cluster_all_nodes_as_shards
- clusters_all_nodes_as_replicas
Once this is complete, other queries that span nodes can be performed. For example:
CREATE TABLE test_table ON CLUSTER '{cluster}' (id UInt8) Engine=ReplicatedMergeTree('/clickhouse/tables/{database}/{shard}/{table}', '{replica}') ORDER BY (id);
That will create a table on all servers in the cluster. You can insert data into this table and it will be replicated automatically to the other shards.
To store the data or read the data from all shards at the same time, create a Distributed
table that links to the replicatedMergeTree
table.
Users
Disable or add password for the default users default
and readonly
if your server is accessible from non-trusted networks.
If you will add password to the default
user, you will need to adjust cluster configuration, since the other servers need to know the default
user’s should know the default user’s to connect to each other.
If you’re inside a trusted network, you can leave default user set to nothing to allow the ClickHouse nodes to communicate with each other.
Engines & ClickHouse building blocks
For general explanations of roles of different engines - check the post Distributed vs Shard vs Replicated ahhh, help me!!!.
Configuration Best Practices
Attribution
Modified from a GitHub post by Mikhail Filimonov.The following are recommended Best Practices when it comes to setting up a ClickHouse Cluster with Zookeeper:
-
Don’t edit/overwrite default configuration files. Sometimes a newer version of ClickHouse introduces some new settings or changes the defaults in
config.xml
andusers.xml
.-
Set configurations via the extra files in
conf.d
directory. For example, to overwrite the interface save the fileconf.d/listen.xml
, with the following:<?xml version="1.0"?> <yandex> <listen_host replace="replace">::</listen_host> </yandex>
-
The same is true for users. For example, change the default profile by putting the file in
users.d/profile_default.xml
:
<?xml version="1.0"?> <yandex> <profiles> <default replace="replace"> <max_memory_usage>15000000000</max_memory_usage> <max_bytes_before_external_group_by>12000000000</max_bytes_before_external_group_by> <max_bytes_before_external_sort>12000000000</max_bytes_before_external_sort> <distributed_aggregation_memory_efficient>1</distributed_aggregation_memory_efficient> <use_uncompressed_cache>0</use_uncompressed_cache> <load_balancing>random</load_balancing> <log_queries>1</log_queries> <max_execution_time>600</max_execution_time> </default> </profiles> </yandex>
- Or you can create a user by putting a file
users.d/user_xxx.xml
:
<?xml version="1.0"?> <yandex> <users> <xxx> <!-- PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-' --> <password_sha256_hex>...</password_sha256_hex> <networks incl="networks" /> <profile>readonly</profile> <quota>default</quota> <allow_databases incl="allowed_databases" /> </xxx> </users> </yandex>
-
Some parts of configuration will contain repeated elements (like allowed ips for all the users). To avoid repeating that - use substitutions file. By default its /etc/metrika.xml, but you can change it for example to /etc/clickhouse-server/substitutions.xml (<include_from> section of main config). Put that repeated parts into substitutions file, like this:
<?xml version="1.0"?> <yandex> <networks> <ip>::1</ip> <ip>127.0.0.1</ip> <ip>10.42.0.0/16</ip> <ip>192.168.0.0/24</ip> </networks> <clickhouse_remote_servers> <!-- cluster definition --> </clickhouse_remote_servers> <zookeeper-servers> <node> <host>zookeeper1</host> <port>2181</port> </node> <node> <host>zookeeper2</host> <port>2181</port> </node> <node> <host>zookeeper3</host> <port>2181</port> </node> </zookeeper-servers> <clickhouse_compression></clickhouse_compression> </yandex>
-
These files can be common for all the servers inside the cluster or can be individualized per server. If you choose to use one substitutions file per cluster, not per node, you will also need to generate the file with macros, if macros are used.
This way you have full flexibility; you’re not limited to the settings described in template. You can change any settings per server or data center just by assigning files with some settings to that server or server group. It becomes easy to navigate, edit, and assign files.
Other Configuration Recommendations
Other configurations that should be evaluated:
<listen>
inconfig.xml
: Determines which IP addresses and ports the ClickHouse servers listen for incoming communications.<max_memory_..>
and<max_bytes_before_external_...>
inusers.xml
. These are part of the profile<default>
.<max_execution_time>
<log_queries>
The following extra debug logs should be considered:
- part_log
- text_log
Understanding The Configuration
ClickHouse configuration stores most of its information in two files:
config.xml
: Stores Server configuration parameters. They are server wide, some are hierarchical , and most of them can’t be changed in runtime. Only 3 sections will be applied w/o restart:- macros
- remote_servers
- logging level
users.xml
: Configure users, and user level / session level settings.- Each user can change these during their session by:
- Using parameter in http query
- By using parameter for
clickhouse-client
- Sending query like
set allow_experimental_data_skipping_indices=1
.
- Those settings and their current values are visible in
system.settings
. You can make some settings global by editing default profile inusers.xml
, which does not need restart. - You can forbid users to change their settings by using
readonly=2
for that user, or using setting constraints. - Changes in
users.xml
are applied w/o restart.
- Each user can change these during their session by:
For both config.xml
and users.xml
, it’s preferable to put adjustments in the config.d
and users.d
subfolders instead of editing config.xml
and users.xml
directly.
Backups
ClickHouse is currently at the design stage of creating some universal backup solution. Some custom backup strategies are:
-
Each shard is backed up separately.
-
FREEZE the table/partition. For more information, see Alter Freeze Partition.
- This creates hard links in shadow subdirectory.
-
rsync
that directory to a backup location, then remove that subfolder from shadow.- Cloud users are recommended to use Rclone.
-
Always add the full contents of metadata subfolder contain the current DB schema and clickhouse configs to your backup.
-
For a second replica, it’s enough to copy metadata & config.
This implementation follows a similar approach by clickhouse-backup.
We have not used this tool on production systems, and can make no recommendations for or against it. As of this time
clickhouse-backup
is not a complete backup solution, but it does simply some parts of the backup process. -
Don’t try to compress backups; the data is already compressed in ClickHouse.
-
One other option is an extra passive replica. This is not recommended in the cases of power user issues.
Version Upgrades
Update itself is simple: update packages, restart clickhouse-server service afterwards.
- Check if the version you want to upgrade to is stable. We highly recommend the Altinity Stable® Releases.
- Review the changelog to ensure that no configuration changes are needed.
- Update staging and test to verify all systems are working.
- Prepare and test downgrade procedures so the server can be returned to the previous version if necessary.
- Start with a “canary” update. This is one replica with one shard that is upgraded to make sure that the procedure works.
- Test and verify that everything works properly. Check for any errors in the log files.
- If everything is working well, update rest of the cluster.
For small clusters, the BlueGreenDeployment technique is also a good option.
FAQ
How Do I Restrict Memory Usage?
See max_memory_usage for more information.
ClickHouse died during big query execution
Warning
???oomkiller
How Do I Limit Memory Usage On Group By?
See max_bytes_before_external_group_by.
Data returned in blocks in clickhouse-client
Warning
???format PrettyCompact + multithreading & streaming.
Either other format PrettyCompactMonoBlock either finish with one stream (for example by adding order by).
I Can’t Connect From Other Hosts
Check the <listen>
settings in config.xml
. Verify that the connection can connect on both IPV4 and IPV6.
Does clickhouse have stored procedures / UDF?
Not as of this time. PR can be sent with needed functionality.
How do I Store IPv4 and IPv6 Address In One Field?
There is a clean and simple solution for that. Any IPv4 has it’s unique IPv6 mapping:
- IPv4 IP address: 191.239.213.197
- IPv4-mapped IPv6 address: ::ffff:191.239.213.197
Eventual consistency & atomic insert
Warning
???sequential = insert_quorum (no concurrent inserts currently :\ ) + select_sequential_consistency
How Do I Simulate Window Functions Using Arrays?
- Group with
groupArray
. - Calculate the needed metrics.
- Ungroup back using array
join
.
How to avoid bugs
ClickHouse is quite young product, but has already thousands of features and development process goes in a very high pace. All core features are well tested, and very stable.
Historically for all projects, new features can be less stable. Usually these are adopted by the community and stabilize quickly. And And of course all possible combinations of features just physically can’t be tested.
We recommend the following practices:
- Check all the changes on the staging first, especially if some new features are used.
- Check latest stable or test versions of ClickHouse on your staging environment regularly and pass the feedback to us or on the official ClickHouse github.
- Ask for known issues report before updating production.
- For production, versions with the 3rd digit should be as high as possible indicating they are the most recent big fix Release ID. For example, version 20.3.5:
20
is the year of release.3
indicates a Feature Release. This is an increment where features are delivered.5
is the bug fix Release Id. New Feature Releases typically have a 1 in the Release Id section and potentially more undiscovered bugs.