Monitoring a Cluster

How to monitor and manager your clusters’ performance.

There are several ways to monitor your ClickHouse cluster:

Grafana dashboards

Altinity.Cloud uses Grafana as its default monitoring tool. You can access Grafana from the Monitoring section of a cluster panel:

Cluster Monitoring View

Figure 1 - The Monitoring section of the cluster panel

Clicking the View in Grafana link displays the following menu:

Cluster Monitoring menu

Figure 2 - The Grafana monitoring menu

We’ll go through those menu items next. If you’d like to jump to a particular Grafana view, click any of these links:

The Cluster Metrics view

The Cluster Metrics view

Selecting Cluster Metrics opens this Grafana dashboard in another browser tab:

The Cluster Metrics view

Figure 3 - The Cluster Metrics dashboard

Cluster metrics include things like the number of bytes and rows inserted into databases in the ClickHouse cluster, merges, queries, connections, and memory / CPU usage.

The System Metrics view

The System Metrics view

Selecting System Metrics opens this Grafana dashboard in another browser tab:

The System Metrics view

Figure 4 - The System Metrics dashboard

System metrics include things like CPU load, OS threads and processes, network traffic for each network connection, and activity on storage devices.

The Queries view

The Queries view

Selecting Queries opens this Grafana dashboard in another browser tab:

The Queries view

Figure 5 - The Queries dashboard

The Queries dashboard includes information about your most common queries, slow queries, failed queries, and the queries that used the most memory.

The Logs view

The Logs view

Selecting Logs opens this Grafana dashboard in another browser tab:

The Logs view

Figure 6 - The Logs dashboard

The Logs dashboard shows all of the log messages as well as the frequency of messages over time. You can add a query to the Logs visualization to filter the view for particular messages.

Cluster alerts

You can define cluster alerts to notify users when certain events occur. You can access alerts from the button on a cluster panel:

Cluster Alerts

Figure 7 - The ALERTS item in the cluster panel

Clicking on the button displays the Cluster Alerts dialog:

Cluster Alerts dialog

Figure 8 - The Cluster Alerts dialog

Enter the email address of the user who should be alerted when particular events occur. For each event, you can send them a popup message in the Altinity Cloud Manager UI and/or send an email.

The different types of alerts are:

  • System Alerts: Triggered by a significant system event such as a network outage.
  • ClickHouse Version Upgrade: Triggered by an update to the version of ClickHouse installed in the cluster.
  • Cluster Rescale: Triggered when the cluster is rescaled.
  • Cluster Stop: Triggered when some event has caused the cluster to stop running. This could be some event that caused a problem, a user stopping the cluster, or a stop caused by your cluster uptime settings.
  • Cluster Resume: Triggered when a previously stopped cluster is restarted.

A popup alert appears at the top of the ACM UI:

Cluster resumed alert

Figure 9 - An alert for a resumed cluster

Health checks

You can check the health of a cluster or node from the ACM. For clusters, there are two basic checks: the health of the nodes in the cluster and the health of the cluster itself. The health checks for a node are whether the node is online and, as you would expect, the health of the node itself.

Cluster health checks

Cluster health checks appear near the top of a Cluster view. For example, here is the panel view of a cluster with the two health checks:

Cluster Alerts

Figure 10 - A cluster panel with its two health checks

The health check at the top of the panel indicates that 2 of the 2 nodes in the cluster are online:

All cluster nodes online

Clicking on this green bar takes you to the detailed view of the cluster. From there you can see the individual nodes and their status.

The second health check indicates that 6 of the 6 cluster health checks passed:

All cluster checks passed

Clicking on this green bar shows you the health check dialog:

Details of the cluster health checks

Figure 11 - The Health Checks dialog

The cluster health checks are based on six SELECT statements executed against the cluster and its infrastructure. The six statements look at the following cluster properties:

  • Access point availability
  • Distributed query availability
  • Zookeeper availability
  • Zookeeper contents
  • Readonly replicas
  • Delayed inserts

Clicking any of the checks shows the SQL statement used in the check along with its results:

Details of the access point check

Figure 12 - Details of a particular cluster health check

Depending on the cluster’s status, you may see other indicators:

Health check Meaning
A cluster or node that is restarting
The cluster or node is rescaling
A cluster or node that is being terminated
The cluster or node is being terminated
A cluster or node that is stopped
The cluster or node is stopped

Node health checks

The basic “Node is online” check appears next to the node name in the Nodes view of the cluster:

The Nodes view

Figure 13 - The Nodes view of a cluster

Opening the Node view shows more details:

Node health

Figure 14 - The health checks for a single node in the cluster

The first health check indicates that the node is online:

Node is online

The second health check indicates that 5 of the 5 node health checks passed:

All node checks passed

Clicking on this green bar takes you to a more detailed view of the health checks and their results, similar to Figure 11 above.

Cluster logs

You can look at a variety of logs by clicking the button on a cluster panel:

Cluster Logs

Figure 15 - The LOGS item in the cluster panel

You’ll see this panel:

Cluster Logs view

Figure 16 - The Logs panel

Notice at the top of the panel that there are five different logs available:

  • ACM Logs: messages issued by the Altinity Cloud Manager
  • ClickHouse Logs: messages issued by ClickHouse itself
  • Backup Logs: messages related to system backups
  • Operator Logs: messages issued by Altinity's ClickHouse Kubernetes operator
  • Audit Logs: messages related to significant system events initiated by a user

The upper right corner of the Logs panel includes the Download Logs button and the Refresh button.

Notifications

You can see your notifications by clicking on your username in the upper right corner of Altinity Cloud Manager:

Cluster Lock button

The Notifications menu item lets you view any notifications you have received:

The Notification History panel

Figure 17 - The Notification History dialog

Here the history shows a single message. The text of the message, its severity (Info, News, Warning, or Danger), and the time the message was received and acknowledged are displayed. The meanings of the message severities are:

  • - Updates for general information
  • - Notifications of general news and updates in Altinity.Cloud
  • - Notifications of possible issues that are less than critical
  • - Critical notifications that can effect your clusters or account