Monitoring a Cluster
There are several ways to monitor your ClickHouse® clusters:
Grafana dashboards
Altinity.Cloud uses Grafana as its default monitoring tool. You can access Grafana from the Monitoring section of a cluster panel:
Figure 1 - The Monitoring section of the cluster panel
Clicking the View in Grafana link displays the following menu:
Figure 2 - The Grafana monitoring menu
We’ll go through those menu items next. If you’d like to jump to a particular Grafana view, click any of these links:
The Cluster Metrics view
Selecting Cluster Metrics opens this Grafana dashboard in another browser tab:
Figure 3 - The Cluster Metrics dashboard
Cluster metrics include things like the number of bytes and rows inserted into databases in the ClickHouse cluster, merges, queries, connections, and memory / CPU usage.
The System Metrics view
Selecting System Metrics opens this Grafana dashboard in another browser tab:
Figure 4 - The System Metrics dashboard
System metrics include things like CPU load, OS threads and processes, network traffic for each network connection, and activity on storage devices.
The Queries view
Selecting Queries opens this Grafana dashboard in another browser tab:
Figure 5 - The Queries dashboard
The Queries dashboard includes information about your most common queries, slow queries, failed queries, and the queries that used the most memory.
The Logs view
Selecting Logs opens this Grafana dashboard in another browser tab:
Figure 6 - The Logs dashboard
The Logs dashboard shows all of the log messages as well as the frequency of messages over time. You can add a query to the Logs visualization to filter the view for particular messages.
Cluster alerts
You can define cluster alerts to notify users when certain events occur. You can access alerts from the button on a cluster panel:
Figure 7 - The ALERTS item in the cluster panel
Clicking on the button displays the Cluster Alerts dialog:
Figure 8 - The Cluster Alerts dialog
Enter one or more comma-separated email addresses of the user(s) who should be alerted when particular events occur. For each event, you can send them a popup message in the Altinity Cloud Manager UI and/or send an email.
The different types of alerts are:
- System Alerts: Triggered by a significant system event such as a network outage.
- ClickHouse Version Upgrade: Triggered by an update to the version of ClickHouse installed in the cluster.
- Cluster Rescale: Triggered when the cluster is rescaled.
- Cluster Stop: Triggered when some event has caused the cluster to stop running. This could be some event that caused a problem, a user stopping the cluster, or a stop caused by your cluster uptime settings.
- Cluster Resume: Triggered when a previously stopped cluster is restarted.
A popup alert appears at the top of the ACM UI:
Figure 9 - A popup alert for a resumed cluster
Health checks
You can check the health of a cluster or node from the ACM. For clusters, there are two basic checks: the health of the nodes in the cluster and the health of the cluster itself. The health checks for a node are whether the node is online and, as you would expect, the health of the node itself.
Cluster health checks
Cluster health checks appear near the top of a Cluster view. For example, here is the panel view of a cluster with the two health checks:
Figure 10 - A cluster panel with its two health checks
The health check at the top of the panel indicates that 2 of the 2 nodes in the cluster are online:
Clicking on this green bar takes you to the detailed view of the cluster. From there you can see the individual nodes and their status.
The second health check indicates that 6 of the 6 cluster health checks passed:
Clicking on this green bar shows you the health check dialog:
Figure 11 - The Health Checks dialog
The cluster health checks are based on six SELECT
statements executed against the cluster and its infrastructure. The six statements look at the following cluster properties:
- Access point availability
- Distributed query availability
- Zookeeper availability
- Zookeeper contents
- Readonly replicas
- Delayed inserts
Clicking any of the checks shows the SQL statement used in the check along with its results:
Figure 12 - Details of a particular cluster health check
Depending on the cluster’s status, you may see other indicators:
Health check | Meaning |
---|---|
The cluster or node is rescaling | |
The cluster or node is being terminated | |
The cluster or node is stopped |
Node health checks
The basic “Node is online” check appears next to the node name in the Nodes view of the cluster:
Figure 13 - The Nodes view of a cluster
Opening the Node view shows more details:
Figure 14 - The health checks for a single node in the cluster
The first health check indicates that the node is online:
The second health check indicates that 5 of the 5 node health checks passed:
Clicking on this green bar takes you to a more detailed view of the health checks and their results, similar to Figure 11 above.
Cluster logs
You can look at a variety of logs by clicking the button on a cluster panel:
Figure 15 - The LOGS item in the cluster panel
You’ll see this panel:
Figure 16 - The Logs panel
Notice at the top of the panel that there are five different logs available:
- ClickHouse Logs: messages issued by ClickHouse itself
- Backup Logs: messages related to system backups
- Operator Logs: messages issued by the Altinity Kubernetes operator for ClickHouse
- Audit Logs: messages related to significant system events initiated by a user
The upper right corner of the Logs panel includes the Download Logs button and the Refresh button.
Notifications
You can see your notifications by clicking on your username in the upper right corner of Altinity Cloud Manager:
The Notifications menu item lets you view any notifications you have received:
Figure 17 - The Notification History dialog
Here the history shows a single message. The text of the message, its severity (Info, News, Warning, or Danger), and the time the message was received and acknowledged are displayed. The meanings of the message severities are:
- - Updates for general information
- - Notifications of general news and updates in Altinity.Cloud
- - Notifications of possible issues that are less than critical
- - Critical notifications that can effect your clusters or account