High Availability and Disaster Recovery

Best Practices recovering a disaster and keeping ClickHouse available.

Analytic systems are the eyes and ears of data-driven enterprises. It is critical to ensure they continue to work at all times despite failures small and large or users will be deprived of the ability to analyze and react to changes in the real world. Let’s start by defining two key terms.

  • High Availability: (HA) includes the mechanisms that allow computer systems to continue operating following the failure of individual components.
  • Disaster Recovery: (DR) includes the tools and procedures to enable computer systems to resume operation following a major catastrophe that affects many or all parts of a site.

These problems are closely related and depend on a small set of fungible technologies that include off-site backups and data replication..

The High Availability and Disaster Recovery guide provides an overview of the standard HA architecture for ClickHouse and a draft design for DR.


Classes of Failures

The types of failures that can occur.

High Availability Architecture

The best practices to keep ClickHouse available.

Disaster Recovery Architecture

How to make ClickHouse more resilient