Backing up and Restoring Data
Working with backups is a crucial part of any analytics infrastructure. There is rich backup and restore functionality built into Altinity.Cloud, but some features cannot be directly accessed by users. We’ll start by discussing the use cases for restoring backups, we’ll move on to your options for configuring backups, then we’ll look at the Cluster Restore Wizard and how to Restore an individual table.
Restoring backups - use cases
Please contact Altinity support if you need to restore a backup. The response time for urgent requests is under 4 hours on the Enterprise Support plan, but usually we respond faster.
The most common use cases you’ll encounter are:
- Partial data corruption in the table - it is possible to restore the table in the following ways:
- Restore a full table in place - Can be done by users; see Restoring an individual table below
- Restore a full table into a separate database for inspection and point fixes, copying data manually - Contact Altinity Support
- Restore specific partitions in place - Contact Altinity Support
- Accidental drop table - The table can be fully restored in place by users; see Restoring an individual table below.
- Accidental drop cluster - Difficult to do, but the cluster can be fully restored, preserving its configuration. Contact Altinity Support
- Restoring a backup for testing purposes (for testing upgrades, hardware etc.) Contact Altinity Support Possible use cases include:
- Restoring a single database to a separate database of an existing cluster
- Restoring a cluster into a new cluster
- Restoring a cluster into a new cluster in different region or environment - Contact Altinity Support
Creating a backup manually
Creating a backup of a cluster is straightforward: simply click the Create Backup item on the ACTIONS menu in the cluster view.
You’ll see an informative message like this:
As the dialog points out, backups are stored separately from the cluster, so you can restore your cluster from a backup even if you delete the cluster. Click OK to create the backup.
Configuring automatic backups
There are a variety of backup settings you can configure for an environment, including when backups should be taken, the cloud provider you’re using for backups, the compression format, and the version of the ClickHouse® backup tool to use.
To configure backups for an environment, click the Environments tab on the left to see a list of all your environments:
Select an environment, click the vertical dots icon, then choose Edit from the menu:
The Environment Details dialog will appear; click on the Backups tab to configure your backup infrastructure:
There are several options that apply to any environment, we’ll cover those elements first.
Configuring basic settings
No matter which cloud provider you’re using, these fields appear at the top of the dialog:
Field details
Turn On Backups
Turns backups on or off, as you would expect. If backups are disabled for your environment, the Clusters View will show a red flag for each cluster:
Backup Schedule
Lets you define days and times when backups should be taken. See Scheduling backups below for the details.
Backup Tool Image
The Docker image name and tag that should be used to create backups. The default is altinity/clickhouse-backup:2.4.14
, which uses Altinity’s open-source clickhouse-backup utility.
Compression Format
The default is tar
; other options are gzip
and zstd
. Be aware that creating a tar
file has the lowest impact on the CPU, but it creates the largest file because a tar
file isn’t compressed. On the other hand, the other compression formats take more CPU cycles to create, but have smaller file sizes. Choose accordingly.
Enable Objects Labeling
If selected, everything in a backup is labeled with the name of the cluster. This can be useful if you’re working directly with the bucket where the backups are stored.
Configuring an external backup
If you have a Bring Your Own Kubernetes (BYOK) ClickHouse cluster, you can configure external backups at AWS, Azure, or GCP. For Bring Your Own Cloud (BYOC) clusters, Altinity configures backups for you.
Configuring an external backup is different depending on the cloud provider you’re using. If you’d like to jump ahead to the details for a specific provider, feel free:
If you want, you can define a different cloud provider for your ClickHouse clusters and your backups. If you choose to do this, we won’t judge, but you’ll be asked to confirm your decision:
Backing up to AWS
If you’re backing up your ClickHouse clusters to AWS, you need to provide your AWS credentials and other details:
Field details
Bucket
The name of the S3 bucket you’re using
Region
The AWS region where the bucket is stored
Access Key
The access key credential for your AWS account. This field is ignored if you have a value in the Assume ARN field below.
Secret Key
The secret key credential for your AWS account. This field is ignored if you have a value in the Assume ARN field below.
Assume ARN
The ARN (Amazon Resource Name) for the bucket. If you have a value in this field, ACM ignores any values in the Access Key and Secret Key fields above.
NOTE - Configuring an ARN that works with your service accounts and the Altinity Cloud Manager is complicated. Contact Altinity support for help in setting up your ARN.
Path
The optional path to the directory inside the bucket where your data is stored. The default value is altinity-cloud-managed-clickhouse
.
TEST CONNECTION
If you’re using an ARN, the TEST CONNECTION button will become active when you enter a value in the Assume ARN field.
If you’re not using an ARN, TEST CONNECTION becomes active the first time you enter an Access Key and Secret Key.
Whatever type of credentials you’re using, clicking the button returns one of these messages:
You’ll of course need to correct any errors before you can continue. Click OK when you’re done.
Note that when you return to this panel, the value of the Secret Key field will be hidden and the TEST CONNECTION button will be disabled. You’ll need to enter your credentials again if you want to re-test the connection.
Backing up to GCP
With GCP, you need a JSON file that contains your credentials:
Field details
Bucket
The name of the bucket
Credentials JSON
JSON data that contains credentials associated with a GCP service account. That service account can have access to your entire GCP project, or it may be restricted to a single bucket or even a single folder within a single bucket. See the Google Cloud documentation for details:
- Creating a service account
- Creating a credentials.json file with the gcloud iam service-accounts keys create command
Path
The optional path to the directory inside the bucket where your data is stored. The default value is altinity-cloud-managed-clickhouse
.
TEST CONNECTION
When you’ve defined a complete set of credentials, the TEST CONNECTION button at the bottom of the dialog will become active. Clicking the button will return one of these messages:
You’ll of course need to correct any errors before you can continue. Click OK when you’re done.
Note that when you return to this panel, the value of the Credentials JSON field will be hidden and the TEST CONNECTION button will be disabled. You’ll need to enter your credentials again if you want to re-test the connection.
Backing up to Azure
Defining a connection to your Azure account is straightforward:
Field details
Container
The name of the storage container
Account Name
The name associated with your account
Account Key
The security credential for your account
Path
The optional path to the directory inside the bucket where your data is stored. The default value is altinity-cloud-managed-clickhouse
.
TEST CONNECTION
When you’ve defined a complete set of credentials, the TEST CONNECTION button at the bottom of the dialog will become active. Clicking the button will return one of these messages:
You’ll of course need to correct any errors before you can continue. Click OK when you’re done.
Note that when you return to this panel, the value of the Account Key field will be hidden and the TEST CONNECTION button will be disabled. You’ll need to enter your credentials again if you want to re-test the connection.
Scheduling backups
You can create a schedule to create backups automatically at certain times of the day, week, or month. A backup schedule defined on the Environment Details panel applies to every cluster in the environment.
You define a backup schedule with these controls:
There are five options to define the Period when backups should occur:
- Monthly - Define the day of the month
- Weekly - Define the day of the week
- Daily - Define the time of day
- Every 6 hours - Backups occur every six hours
- Every hour - Backups occur every hour.
NOTE: All times are defined in GMT.
In addition to defining the period, you can also define the number of Backups to Keep. The default is seven.
The button lets you define multiple schedules. For example, if you only want backups to occur on Friday and Saturday, create two Weekly schedules, one for Friday and one for Saturday. You can define up to three schedules.
You can override the environment-level backup schedule by defining a different schedule for a particular cluster. Select Backup Settings from the CONFIGURE menu in a cluster view:
You’ll see the same controls above inside a greatly simplified dialog.
A cluster can have a different backup schedule, but none of the other environment-level backup settings(cloud provider, backup tool image, compression format, etc.)can be changed.
The Cluster Restore Wizard
WARNING: FOR ADVANCED USERS ONLY.
To restore a backup, begin by selecting Restore a Backup on the ACTIONS menu. The Cluster Restore Wizard lets you restore a cluster from a backup. We’ll go through all of the steps and options next, but if you’re looking for help on a particular section of the wizard, you can skip ahead to any of the tabs:
1. Backup Location tab
The first step is to specify the location of the backup you’re restoring.
Option 1A - Backup is in Altinity.Cloud
The simplest case is a backup stored in your Altinity.Cloud environment:
Field details
Source Environment
The name of the Altinity.Cloud environment that holds the backup. Click the down arrow to see a list of all of your environments.
Click NEXT to continue.
Option 1B - Backup is in your AWS account
Another alternative, of course, is that the backup is stored in your AWS or GCP account. The details you need to provide are different in each case, as you would expect. You’ll see this panel if your backup is at AWS:
Field details
Access Key
The access key for your AWS account.
Secret Key
The secret key for your AWS account.
Region
The AWS region where your backup is stored.
Bucket
The name of the bucket where your backup is stored.
ACM-Compatible Folder Structure
Check this box if the backup was created by ACM or if you know the backup has a fully ACM-compatible structure.
Click NEXT to continue.
NOTE: When you click NEXT, the ACM takes your credentials and attempts to access the bucket you named in the region you selected. If that fails, you’ll get an error message with details on what went wrong:
You’ll have to fix the error before you can continue.
Option 1C - Backup is in your GCP account
Finally, if you’re on GCP, you’ll see this instead:
Field details
Credentials JSON
JSON data that contains credentials associated with a GCP service account. That service account can have access to your entire GCP project, or it may be restricted to a single bucket or even a single folder within a single bucket. See the Google Cloud documentation for details:
- Creating a service account
- Creating a credentials.json file with the gcloud iam service-accounts keys create command
Region
The GCP region where your backup is stored.
Bucket
The name of the bucket where your backup is stored.
ACM-Compatible Folder Structure
Check this box if the backup was created by ACM or if you know the backup has a fully ACM-compatible structure.
Click NEXT to continue.
NOTE: When you click NEXT, the ACM takes the credentials JSON you entered and attempts to access the bucket you named in the region you selected. If that fails, you’ll get an error message:
You’ll have to fix the error before you can continue.
2. Source Cluster tab
Next we need to select the source cluster for the backup we’re restoring:
The available backups are listed in the Cluster column. The Namespace is the Kubernetes namespace that contains your ClickHouse installation. Finally, a checkmark indicates that the backup includes cluster configuration information.
Select a cluster and click NEXT to continue.
3. Source Backup tab
Once you’ve selected a cluster to restore, you’ll see a list of all of the backups for that cluster:
Select a backup (in the example above there’s only one) and click NEXT to continue.
4. Tables tab
At this point you’ve specified where the backup is stored, selected the cluster you want to restore, and selected the particular backup of that cluster you want to restore. Next, you need to decide which tables you want to restore.
Option 4A - Restore all tables
The simplest option, of course, is All tables:
Click NEXT to continue.
Option 4B - Restore some tables
You can also specify patterns for the table names and engine types you want to include or exclude:
Separate multiple table or engine patterns with commas.
Patterns can contain splat [*]
and question mark [?]
wildcards:
- The splat matches any sequence of characters before or after a separator. For example,
default.*
matches all tables in thedefault
database. - The question mark matches a single character. For example,
db.??_table
matchesdb.ab_table
anddb.cd_table
.
Click NEXT to continue.
5. Destination Cluster tab
The final step is to specify where to put the restored cluster.
Option 5A - Launch in a new cluster
One option is to simply launch a new cluster:
Enter a name for the destination cluster.
NOTE: If you choose to launch a new cluster, at the end of the Cluster Restore Wizard you’ll be taken to the Launch Cluster Wizard to define all the details of the new cluster.
Click NEXT to continue.
Option 5B - Launch a new cluster based on a source cluster
Another possibility is to use the configuration and settings of the source cluster to create a new cluster:
There are some settings you can change, such as the version of ClickHouse the new cluster should run or how much storage the new cluster should have. Beyond the fields shown on this tab, everything else will be the same.
Field details
Name
The name of the restored cluster.
ClickHouse Version
Select the version of ClickHouse you want your cluster to use. Click the down arrow icon to see a list of available versions. ALTINITY BUILDS is selected by default; that lets you choose which Altinity Stable Build you want to use. You can also click UPSTREAM BUILDS to see other versions of ClickHouse.
Beneath the ClickHouse version is a link to the release notes for the build you’ve selected. The release notes have extensive details of what is new and changed and fixed in each release. Click the link to open the release notes in a new browser tab.
Node Type
Click the down arrow to see what machine types are available. Each item in the list will tell you how many CPUs and how much RAM that machine type has.
Volume Type
Click the down arrow to see what classes of storage are available for your restored ClickHouse cluster.
Node Storage
The amount of storage in GB that each node in the restored cluster will have.
Number of Volumes
The number of storage volumes your restored cluster will have.
Click NEXT to continue.
Option 5C - Launch in an existing cluster
The third option is to restore the backup into an existing cluster:
Click the down arrow icon to see the list of available clusters. Select a cluster and click NEXT to continue.
6. Review and Confirm tab
The Review and Confirm tab lets you go over the choices you’ve made before restoring the cluster:
If everything looks good, click CONTINUE.
If you selected Launch a new Cluster on the Destination Cluster tab (option 5A), you’ll be taken to the Launch Cluster Wizard to specify all of the configuration details and settings for the new cluster. When you’ve completed the Launch Cluster Wizard, the ACM will create the new cluster and restore the backup to it.
If you selected anything else on the Destination Cluster tab, the ACM will start restoring the cluster. As you would expect, this may take several minutes. When the cluster is restored, you’ll get an alert at the top of the ACM UI:
Restoring an individual table
In addition to the cluster-level backup and restore features, you can restore an individual table to an existing database or a different database. To get started, click the EXPLORE button in the Clusters view:
Figure 12 – The EXPLORE button
In the Explorer view, switch to the Schema tab:
Figure 13 – The Schema tab
Click the vertical dots icon next to the table you want to restore and select Restore from Backup:
Figure 14 – The Restore from Backup menu item
Select the backup you want to restore from. By default, the table will be restored from the selected backup into the same database.
NOTE: If you’re restoring a table to the same database, the table is first removed from the database. Once the table is removed, it is restored from the selected backup.
Figure 15 – Restoring the table to the same database
You also have the option of restoring the table from the selected backup to a different database. Select the Different database radio button, then enter the name of the other database:
Figure 16 – Restoring the table to a different database
When you’re ready to restore the table, click CONFIRM. When the table is restored, you’ll see a message in the ACM:
Cloning a database
Cloning a database creates a copy of that database inside the same ClickHouse cluster. This can be useful for testing applications against realistic data.
NOTE: This operation creates a clone of the database, not a replica. Any changes to the original database will not be reflected in the clone, and any changes to the clone will not be reflected in the original database.
To get started, click the EXPLORE button in the Clusters view:
Figure 17 – The EXPLORE button
In the Explorer view, switch to the Schema tab:
Figure 18 – The Schema tab
Click the vertical dots icon next to the table you want to clone and select Clone Database:
Figure 19 – The Clone Database menu item
You’ll be asked for a database name for the clone. The database name for the clone must begin with an upper- or lowercase letter or an underscore. It can contain letters, numbers, and underscores:
weather2024
- Valid2024weather
- Not valid - names must start with a letter or underscoreWeather_2024
- ValidWEATHER_2024
- ValidwEaThEr_2024
- ValidWEATHER-2024
- Not valid - only letters, numbers, and underscores are allowed_WEATHER_2024
- Valid - names must start with a letter or underscore
If the name you enter isn’t valid, you’ll see this message:
Click CONFIRM to clone the database. The cloned database and its tables will show up in the display after a few minutes.
Converting a table’s engine to ReplicatedMergeTree
To support replication, you can convert a table’s engine from MergeTree
to ReplicatedMergeTree
. (A table with a MergeTree
engine can’t be replicated.)
To get started, click the EXPLORE button in the Clusters view:
Figure 20 – The EXPLORE button
In the Explorer view, switch to the Schema tab:
Figure 21 – The Schema tab
Click the vertical dots icon next to the table whose engine you want to change and select Convert to ReplicatedMergeTree:
Figure 22 – The Convert to ReplicatedMergeTree menu item
As you would expect, this menu item is only available if a table’s engine is MergeTree
.
When you click the menu item, you’ll see a confirmation message:
Click OK to start the conversion.