Zookeeper Monitoring
Zookeeper Monitoring
For organizations that already have Apache Zookeeper configured either manually, or with a Kubernetes operator such as the Altinity Kubernetes Operator for ClickHouse®, monitoring your Zookeeper nodes will help you recover from issues before they happen.
Checking ClickHouse connection to Zookeeper
To check connectivity between ClickHouse and Zookeeper:
-
Confirm that ClickHouse can connect to Zookeeper. You should be able to query the
system.zookeeper
table, and see the path for distributed DDL created in Zookeeper through that table. If something went wrong, check the ClickHouse logs.$ clickhouse-client -q "select * from system.zookeeper where path='/clickhouse/task_queue/'" ddl 17183334544 17183334544 2019-02-21 21:18:16 2019-02-21 21:18:16 0 8 0 0 0 8 17183370142 /clickhouse/task_queue/
-
Confirm Zookeeper accepts connections from ClickHouse. You can also see on Zookeeper nodes if a connection was established and the IP address of the ClickHouse server in the list of clients:
$ echo stat | nc localhost 2181 ZooKeeper version: 3.4.9-3--1, built on Wed, 23 May 2018 22:34:43 +0200 Clients: /10.25.171.52:37384[1](queued=0,recved=1589379,sent=1597897) /127.0.0.1:35110[0](queued=0,recved=1,sent=0)
Zookeeper Monitoring Quick List
The following commands are available to verify Zookeeper availability and highlight potential issues:
Check Name | Shell or SQL command | Severity |
---|---|---|
Zookeeper is available | select count() from system.zookeeper where path=’/’ |
Critical for writes |
Zookeeper exceptions | select value from system.events where event=‘ZooKeeperHardwareExceptions’ |
Medium |
Read only tables are unavailable for writes | select value from system.metrics where metric=‘ReadonlyReplica’ |
High |
A data part was lost | select value from system.events where event=‘ReplicatedDataLoss’ |
High |
Data parts are not the same on different replicas | select value from system.events where event=‘DataAfterMergeDiffersFromReplica’; |
Medium |