ZooKeeper Monitoring

Verifying Zookeeper and ClickHouse are working together.

ZooKeeper Monitoring

For organizations that already have Apache ZooKeeper configured either manually, or with a Kubernetes operator such as the clickhouse-operator for Kubernetes, monitoring your ZooKeeper nodes will help you recover from issues before they happen.

Checking ClickHouse connection to ZooKeeper

To check connectivity between ClickHouse and ZooKeeper.

  1. Confirm that ClickHouse can connect to ZooKeeper. You should be able to query the system.zookeeper table, and see the path for distributed DDL created in ZooKeeper through that table. If something went wrong, check the ClickHouse logs.

    $ clickhouse-client -q "select * from system.zookeeper where path='/clickhouse/task_queue/'"
    ddl 17183334544    17183334544    2019-02-21 21:18:16    2019-02-21 21:18:16    0    8    0    0    0    8    17183370142    /clickhouse/task_queue/
    
  2. Confirm ZooKeeper accepts connections from ClickHouse. You can also see on ZooKeeper nodes if a connection was established and the IP address of the ClickHouse server in the list of clients:

    $ echo stat | nc localhost 2181
    ZooKeeper version: 3.4.9-3--1, built on Wed, 23 May 2018 22:34:43 +0200
    Clients:
     /10.25.171.52:37384[1](queued=0,recved=1589379,sent=1597897)
     /127.0.0.1:35110[0](queued=0,recved=1,sent=0)
    

ZooKeeper Monitoring Quick List

The following commands are available to verify ZooKeeper availability and highlight potential issues:

Check Name Shell or SQL command Severity
ZooKeeper is available select count() from system.zookeeper
where path=’/’
Critical for writes
ZooKeeper exceptions select value from system.events
where event=‘ZooKeeperHardwareExceptions’
Medium
Read only tables are unavailable for writes select value from system.metrics
where metric=‘ReadonlyReplica’
High
A data part was lost select value from system.events
where event=‘ReplicatedDataLoss’
High
Data parts are not the same on different replicas select value from system.events where event=‘DataAfterMergeDiffersFromReplica’;
select value from system.events where event=‘DataAfterMutationDiffersFromReplica’
Medium