Backing up and Restoring Data

Working with backups

Working with backups is a crucial part of any analytics infrastructure. There is rich backup and restore functionality built into Altinity.Cloud, but some features cannot be directly accessed by users. We’ll start by discussing the use cases for restoring backups, we’ll move on to your options for configuring backups, then we’ll look at the Cluster Restore Wizard and how to Restore an individual table.

Restoring backups - use cases

Please contact Altinity support if you need to restore a backup. The response time for urgent requests is under 4 hours on the Enterprise Support plan, but usually we respond faster.

The most common use cases you’ll encounter are:

  • Partial data corruption in the table - it is possible to restore the table in the following ways:
  • Accidental drop table - The table can be fully restored in place by users; see Restoring an individual table below.
  • Accidental drop cluster - Difficult to do, but the cluster can be fully restored, preserving its configuration. Contact Altinity Support
  • Restoring a backup for testing purposes (for testing upgrades, hardware etc.) Contact Altinity Support Possible use cases include:
    • Restoring a single database to a separate database of an existing cluster
    • Restoring a cluster into a new cluster
  • Restoring a cluster into a new cluster in different region or environment - Contact Altinity Support

Creating a backup manually

Creating a backup of a cluster is straightforward: simply click the Create Backup item on the ACTIONS menu in the cluster view.

Launch New Cluster

You’ll see an informative message like this:

Creating a backup

As the dialog points out, backups are stored separately from the cluster, so you can restore your cluster from a backup even if you delete the cluster. Click OK to create the backup.

Configuring automatic backups

There are a variety of backup settings you can configure for an environment, including when backups should be taken, the cloud provider you’re using for backups, the compression format, and the version of the ClickHouse backup tool to use.

To configure backups for an environment, click the Environments tab on the left to see a list of all your environments:

List of environments

Select an environment, click the vertical dots icon, then choose Edit from the menu:

Editing an environment

The Environment Details dialog will appear; click on the Backups tab to configure your backup infrastructure:

The Backups tab on the Environment Details dialog

There are several options that apply to any environment, we’ll cover those elements first.

Configuring basic settings

No matter which cloud provider you’re using, these fields appear at the top of the dialog:

Common backup configuration elements

Field details

Turn On Backups

Turns backups on or off, as you would expect.

Backup Schedule

Lets you define days and times when backups should be taken. See Scheduling backups below for the details.

Backup Tool Image

The Docker image name and tag that should be used to create backups. The default is altinity/clickhouse-backup:2.4.14, which uses Altinity’s open-source clickhouse-backup utility.

Compression Format

The default is tar; other options are gzip and zstd. Be aware that creating a tar file has the lowest impact on the CPU, but it creates the largest file because a tar file isn’t compressed. On the other hand, the other compression formats take more CPU cycles to create, but have smaller file sizes. Choose accordingly.

Enable Objects Labeling

If selected, everything in a backup is labeled with the name of the cluster. This can be useful if you’re working directly with the bucket where the backups are stored.

Configuring an external backup

If you have a Bring Your Own Kubernetes (BYOK) ClickHouse cluster, you can configure external backups at AWS, Azure, or GCP. For Bring Your Own Cloud (BYOC) clusters, Altinity configures backups for you.

Configuring an external backup is different depending on the cloud provider you’re using. If you’d like to jump ahead to the details for a specific provider, feel free:

Backing up to AWS

If you’re backing up your ClickHouse clusters to AWS, you need to provide your AWS credentials and other details:

Common backup configuration elements
Field details

Bucket

The name of the S3 bucket you’re using

Region

The AWS region where the bucket is stored

Access Key

The access key credential for your AWS account. This field is ignored if you have a value in the Assume ARN field below.

Secret Key

The secret key credential for your AWS account. This field is ignored if you have a value in the Assume ARN field below.

Assume ARN

The ARN (Amazon Resource Name) for the bucket. If you have a value in this field, ACM ignores any values in the Access Key and Secret Key fields above.
NOTE - Configuring an ARN that works with your service accounts and the Altinity Cloud Manager is complicated. Contact Altinity support for help in setting up your ARN.

Path

The optional path to the directory inside the bucket where your data is stored. The default value is altinity-cloud-managed-clickhouse.

TEST CONNECTION

If you’re using an ARN, the TEST CONNECTION button will become active when you enter a value in the Assume ARN field.

If you’re not using an ARN, TEST CONNECTION becomes active the first time you enter an Access Key and Secret Key.

Whatever type of credentials you’re using, clicking the button returns one of these messages:

Successful connection test
Unsuccessful connection test

You’ll of course need to correct any errors before you can continue. Click OK when you’re done.

Note that when you return to this panel, the value of the Secret Key field will be hidden and the TEST CONNECTION button will be disabled. You’ll need to enter your credentials again if you want to re-test the connection.

Backing up to GCP

With GCP, you need a JSON file that contains your credentials:

Common backup configuration elements
Field details

Bucket

The name of the bucket

Credentials JSON

JSON data that contains credentials associated with a GCP service account. That service account can have access to your entire GCP project, or it may be restricted to a single bucket or even a single folder within a single bucket. See the Google Cloud documentation for details:

Path

The optional path to the directory inside the bucket where your data is stored. The default value is altinity-cloud-managed-clickhouse.

TEST CONNECTION

When you’ve defined a complete set of credentials, the TEST CONNECTION button at the bottom of the dialog will become active. Clicking the button will return one of these messages:

Successful connection test
Unsuccessful connection test

You’ll of course need to correct any errors before you can continue. Click OK when you’re done.

Note that when you return to this panel, the value of the Credentials JSON field will be hidden and the TEST CONNECTION button will be disabled. You’ll need to enter your credentials again if you want to re-test the connection.

Backing up to Azure

Defining a connection to your Azure account is straightforward:

Common backup configuration elements
Field details

Container

The name of the storage container

Account Name

The name associated with your account

Account Key

The security credential for your account

Path

The optional path to the directory inside the bucket where your data is stored. The default value is altinity-cloud-managed-clickhouse.

TEST CONNECTION

When you’ve defined a complete set of credentials, the TEST CONNECTION button at the bottom of the dialog will become active. Clicking the button will return one of these messages:

Successful connection test
Unsuccessful connection test

You’ll of course need to correct any errors before you can continue. Click OK when you’re done.

Note that when you return to this panel, the value of the Account Key field will be hidden and the TEST CONNECTION button will be disabled. You’ll need to enter your credentials again if you want to re-test the connection.

Scheduling backups

You can create a schedule to create backups automatically at certain times of the day, week, or month. A backup schedule defined on the Environment Details panel applies to every cluster in the environment.

You define a backup schedule with these controls:

Common backup configuration elements

There are five options to define the Period when backups should occur:

  • Monthly - Define the day of the month
  • Weekly - Define the day of the week
  • Daily - Define the time of day
  • Every 6 hours - Backups occur every six hours
  • Every hour - Backups occur every hour.

NOTE: All times are defined in GMT.

In addition to defining the period, you can also define the number of Backups to Keep. The default is seven.

The button lets you define multiple schedules. For example, if you only want backups to occur on Friday and Saturday, create two Weekly schedules, one for Friday and one for Saturday. You can define up to three schedules.

You can override the environment-level backup schedule by defining a different schedule for a particular cluster. Select Backup Settings from the CONFIGURE menu in a cluster view:

You’ll see the same controls above inside a greatly simplified dialog.

A cluster can have a different backup schedule, but none of the other environment-level backup settings(cloud provider, backup tool image, compression format, etc.)can be changed.

The Cluster Restore Wizard

Launch New Cluster

WARNING: FOR ADVANCED USERS ONLY.

To restore a backup, begin by selecting Restore a Backup on the ACTIONS menu. The Cluster Restore Wizard lets you restore a cluster from a backup. We’ll go through all of the steps and options next, but if you’re looking for help on a particular section of the wizard, you can skip ahead to any of the tabs:

1. Backup Location tab

The first step is to specify the location of the backup you’re restoring.

Option 1A - Backup is in Altinity.Cloud

The simplest case is a backup stored in your Altinity.Cloud environment:

Backup location is Altinity.Cloud
Field details

Source Environment

The name of the Altinity.Cloud environment that holds the backup. Click the down arrow to see a list of all of your environments.

Click NEXT to continue.

Option 1B - Backup is in your AWS account

Another alternative, of course, is that the backup is stored in your AWS or GCP account. The details you need to provide are different in each case, as you would expect. You’ll see this panel if your backup is at AWS:

Backup location is AWS
Field details

Access Key

The access key for your AWS account.

Secret Key

The secret key for your AWS account.

Region

The AWS region where your backup is stored.

Bucket

The name of the bucket where your backup is stored.

ACM-Compatible Folder Structure

Check this box if the backup was created by ACM or if you know the backup has a fully ACM-compatible structure.

Click NEXT to continue.

NOTE: When you click NEXT, the ACM takes your credentials and attempts to access the bucket you named in the region you selected. If that fails, you’ll get an error message with details on what went wrong:

Bucket name or credentials is wrong

You’ll have to fix the error before you can continue.

Option 1C - Backup is in your GCP account

Finally, if you’re on GCP, you’ll see this instead:

Backup location is GCP
Field details

Credentials JSON

JSON data that contains credentials associated with a GCP service account. That service account can have access to your entire GCP project, or it may be restricted to a single bucket or even a single folder within a single bucket. See the Google Cloud documentation for details:

Region

The GCP region where your backup is stored.

Bucket

The name of the bucket where your backup is stored.

ACM-Compatible Folder Structure

Check this box if the backup was created by ACM or if you know the backup has a fully ACM-compatible structure.

Click NEXT to continue.

NOTE: When you click NEXT, the ACM takes the credentials JSON you entered and attempts to access the bucket you named in the region you selected. If that fails, you’ll get an error message:

Bucket name or credentials is wrong

You’ll have to fix the error before you can continue.

2. Source Cluster tab

Next we need to select the source cluster for the backup we’re restoring:

Source cluster information

The available backups are listed in the Cluster column. The Namespace is the Kubernetes namespace that contains your ClickHouse installation. Finally, a checkmark indicates that the backup includes cluster configuration information.

Select a cluster and click NEXT to continue.

3. Source Backup tab

Once you’ve selected a cluster to restore, you’ll see a list of all of the backups for that cluster:

Backup information

Select a backup (in the example above there’s only one) and click NEXT to continue.

4. Tables tab

At this point you’ve specified where the backup is stored, selected the cluster you want to restore, and selected the particular backup of that cluster you want to restore. Next, you need to decide which tables you want to restore.

Option 4A - Restore all tables

The simplest option, of course, is All tables:

Select all tables

Click NEXT to continue.

Option 4B - Restore some tables

You can also specify patterns for the table names and engine types you want to include or exclude:

Specify a table include pattern

Separate multiple table or engine patterns with commas.

Patterns can contain splat [*] and question mark [?] wildcards:

  • The splat matches any sequence of characters before or after a separator. For example, default.* matches all tables in the default database.
  • The question mark matches a single character. For example, db.??_table matches db.ab_table and db.cd_table.

Click NEXT to continue.

5. Destination Cluster tab

The final step is to specify where to put the restored cluster.

Option 5A - Launch in a new cluster

One option is to simply launch a new cluster:

Destination new cluster

Enter a name for the destination cluster.

NOTE: If you choose to launch a new cluster, at the end of the Cluster Restore Wizard you’ll be taken to the Launch Cluster Wizard to define all the details of the new cluster.

Click NEXT to continue.

Option 5B - Launch a new cluster based on a source cluster

Another possibility is to use the configuration and settings of the source cluster to create a new cluster:

Destination new cluster with settings

There are some settings you can change, such as the version of ClickHouse the new cluster should run or how much storage the new cluster should have. Beyond the fields shown on this tab, everything else will be the same.

Field details

Name

The name of the restored cluster.

ClickHouse Version

Select the version of ClickHouse you want your cluster to use. Click the down arrow icon to see a list of available versions. ALTINITY BUILDS is selected by default; that lets you choose which Altinity Stable Build you want to use. You can also click COMMUNITY BUILDS to see other versions of ClickHouse.

Beneath the ClickHouse version is a link to the release notes for the build you’ve selected. The release notes have extensive details of what is new and changed and fixed in each release. Click the link to open the release notes in a new browser tab.

Node Type

Click the down arrow to see what machine types are available. Each item in the list will tell you how many CPUs and how much RAM that machine type has.

Volume Type

Click the down arrow to see what classes of storage are available for your restored ClickHouse cluster.

Node Storage

The amount of storage in GB that each node in the restored cluster will have.

Number of Volumes

The number of storage volumes your restored cluster will have.

Click NEXT to continue.

Option 5C - Launch in an existing cluster

The third option is to restore the backup into an existing cluster:

Select all tables

Click the down arrow icon to see the list of available clusters. Select a cluster and click NEXT to continue.

6. Review and Confirm tab

The Review and Confirm tab lets you go over the choices you’ve made before restoring the cluster:

Review and confirm details

If everything looks good, click CONTINUE.

If you selected Launch a new Cluster on the Destination Cluster tab (option 5A), you’ll be taken to the Launch Cluster Wizard to specify all of the configuration details and settings for the new cluster. When you’ve completed the Launch Cluster Wizard, the ACM will create the new cluster and restore the backup to it.

If you selected anything else on the Destination Cluster tab, the ACM will start restoring the cluster. As you would expect, this may take several minutes. When the cluster is restored, you’ll get an alert at the top of the ACM UI:

Cluster restored alert

Restoring an individual table

In addition to the cluster-level backup and restore features, you can restore an individual table to an existing database or a different database. To get started, click the EXPLORE button in the Clusters view:

The EXPLORE button

Figure 12 – The EXPLORE button

In the Explorer view, switch to the Schema tab:

The Schema tab

Figure 13 – The Schema tab

Click the vertical dots icon next to the table you want to restore and select Restore from Backup:

The Restore from Backup menu

Figure 14 – The Restore from Backup menu item

Select the backup you want to restore from. By default, the table will be restored from the selected backup into the same database.

NOTE: If you’re restoring a table to the same database, the table is first removed from the database. Once the table is removed, it is restored from the selected backup.

The Restore Table menu

Figure 15 – Restoring the table to the same database

You also have the option of restoring the table from the selected backup to a different database. Select the Different database radio button, then enter the name of the other database:

The Restore Table menu

Figure 16 – Restoring the table to a different database

When you’re ready to restore the table, click CONFIRM. When the table is restored, you’ll see a message in the ACM:

Table restored successfully message

Cloning a database

Cloning a database creates a copy of that database inside the same ClickHouse cluster. This can be useful for testing applications against realistic data.

NOTE: This operation creates a clone of the database, not a replica. Any changes to the original database will not be reflected in the clone, and any changes to the clone will not be reflected in the original database.

To get started, click the EXPLORE button in the Clusters view:

The EXPLORE button

Figure 17 – The EXPLORE button

In the Explorer view, switch to the Schema tab:

The Schema tab

Figure 18 – The Schema tab

Click the vertical dots icon next to the table you want to clone and select Clone Database:

The Clone Database menu

Figure 19 – The Clone Database menu item

You’ll be asked for a database name for the clone. The database name for the clone must begin with an upper- or lowercase letter or an underscore. It can contain letters, numbers, and underscores:

  • weather2024 - Valid
  • 2024weather - Not valid - names must start with a letter or underscore
  • Weather_2024 - Valid
  • WEATHER_2024 - Valid
  • wEaThEr_2024 - Valid
  • WEATHER-2024 - Not valid - only letters, numbers, and underscores are allowed
  • _WEATHER_2024 - Valid - names must start with a letter or underscore
Cloning a database

If the name you enter isn’t valid, you’ll see this message:

Incorrect database name error message

Click CONFIRM to clone the database. The cloned database and its tables will show up in the display after a few minutes.

Converting a table’s engine to ReplicatedMergeTree

To support replication, you can convert a table’s engine from MergeTree to ReplicatedMergeTree. (A table with a MergeTree engine can’t be replicated.)

To get started, click the EXPLORE button in the Clusters view:

The EXPLORE button

Figure 20 – The EXPLORE button

In the Explorer view, switch to the Schema tab:

The Schema tab

Figure 21 – The Schema tab

Click the vertical dots icon next to the table whose engine you want to change and select Convert to ReplicatedMergeTree:

The Convert to ReplicatedMergeTree menu

Figure 22 – The Convert to ReplicatedMergeTree menu item

As you would expect, this menu item is only available if a table’s engine is MergeTree.

When you click the menu item, you’ll see a confirmation message:

The Convert to ReplicatedMergeTree confirmation message

Click OK to start the conversion.