High Availability and Disaster Recovery

Best Practices recovering a disaster and keeping ClickHouse® available.

Analytic systems are the eyes and ears of data-driven enterprises. It is critical to ensure they continue to work at all times despite failures small and large or users will be deprived of the ability to analyze and react to changes in the real world. Let’s start by defining two key terms.

High Availability: (HA) includes the mechanisms that allow computer systems to continue operating following the failure of individual components.
Disaster Recovery: (DR) includes the tools and procedures to enable computer systems to resume operation following a major catastrophe that affects many or all parts of a site.

These problems are closely related and depend on a small set of fungible technologies that include off-site backups and data replication..

The High Availability and Disaster Recovery guide provides an overview of the standard HA architecture for ClickHouse® and a draft design for DR.

High Availability and Disaster Recovery

Classes of Failures

ClickHouse® High Availability Architecture

ClickHouse® Disaster Recovery Architecture