Configuring a Cluster

How to configure ClickHouse clusters

In this section we’ll look at all the configuration settings you can change or modify for a ClickHouse cluster through the Altinity Cloud Manager (ACM), expanding on the basics we covered in the ACM introduction. The items we’ll go through are in the Configure menu:

The cluster configuration menu

You can skip ahead to a specific configuration task if you like:

Configuring cluster settings

Every cluster has a number of settings. The Settings view has a Server Settings tab that lets you view, add, edit, or delete the server’s current settings, or restore the settings to their defaults. There is also an Environment Variables tab that lets you work with environment variables. In addition to configuring settings and environment variables themselves, there are also a couple of global settings that control how the cluster starts when you apply your changes.

WARNING: FOR ADVANCED USERS ONLY. Many ClickHouse settings can be configured through the Altinity Cloud Manager UI. We strongly recommend using the UI wherever possible; it protects you from syntax errors and other common mistakes.

An important point before we go on: By default, the environment that contains your cluster is configured so that any changes to settings and environment variables are automatically published to your cluster. However, if automatic publishing is turned off for your cluster, you’ll need to use the Publish Configuration feature to make your changes permanent. See Publishing a cluster's configuration for more information.

Working with server settings

The Settings View looks like this:

The System Settings tab in the Settings view

Figure 1 - The Server Settings tab in the Settings View

Settings with a lock icon next to their names can’t be changed through the Settings View. For any other setting, click the vertical dots icon next to the setting name, then click Edit or Delete.

Adding a setting

Clicking the button at the top of the panel lets you add a setting. There are three types of settings:

  • Attribute - A value in the config.xml file
  • config.d file - A value in a file stored in the config.d directory
  • users.d file - A value in a file stored in the users.d directory

Setting an attribute in the config.xml file

The most straightforward kind of setting is an attribute:

Setting an attribute

Figure 2 - The dialog for setting an attribute

The attribute is added to the config.xml file. The Name is the name of the XML element that contains the value. Note that if the XML element is contained in other elements, you need to specify the element’s entire path below the root <clickhouse> element. In the example above, the name logger/level stores the value in config.xml like this:

<clickhouse>
  <logger>
    <level>
      debug
    </level>
  </logger>
</clickhouse>

Setting a value in a file in the config.d directory

The ACM UI makes it easy to create a file with configuration data in the config.d directory. The values defined in that file become part of the ClickHouse system configuration. To set a value, enter a filename and its contents:

Setting a config.d value

Figure 3 - Setting a value in the config.d directory

This example writes the XML in the text box to the file config.d/query_log.xml. When applied to the ClickHouse cluster, logging settings for the query_log table in the system database are updated. Specifically, the retention period for the table is now 90 days (the default is 30).

The value of the Filename field must include an extension. If the contents of the text box are XML, it must be well-formed, and the root element must be either <yandex> or <clickhouse>.

Setting a value in a file in the users.d directory

Similar to the config.d directory, the ACM UI makes it easy to create a file with configuration data in the users.d directory. The values defined in that file become part of the ClickHouse system configuration. To set a value, enter a filename and its contents:

Setting a users.d value

Figure 4 - Setting a value in the users.d directory

This example contains an XML document that defines new users. If you wanted to define multiple users for your ClickHouse cluster, you could go through the ACM UI and create each one individually. However, it might be simpler and faster to define all the users in XML and use this dialog to create the file users.d/myusers.xml. When the new setting is applied to the ClickHouse cluster, there will be new users based on the data in the XML file.

The value of the Filename field must include an extension. As with config.d values, if the contents are XML, it must be well-formed, and the root element must be either <yandex> or <clickhouse>.

Editing a setting

When you click the vertical dots icon next to a setting name, you can click the Edit menu to, well, edit that setting. The dialogs to edit a setting are exactly the same as the dialogs in Figures 2, 3, and 4 above.

Deleting a setting

Clicking the Delete button in the menu next to a setting name brings up the Delete Setting confirmation dialog:

Deleting a setting

Figure 5 - Delete Setting configuration dialog

Click OK to delete the setting.

Example: Configuring protobuf schema

As an example, we’ll look at how to use protobuf in your ClickHouse cluster. You need to add two settings:

  1. The attribute format_schema_path
  2. The file events.proto in users.d

To get started, click the button, then create the format_schema_pathattribute and set its value to /etc/clickhouse-server/users.d/:

Setting the format_schema_path attribute

Figure 6 - Adding the format_schema_path attribute

Click OK to create the attribute. Next, click the button again and create a file in users.d named events.proto with the contents syntax = "proto3":

Creating the events.proto file

Figure 7 - Defining the events.proto file

Click OK to create the setting.

Working with environment variables

To work with environment variables, click the Environment Variables tab in the Settings View:

Environment Variables tab in the Settings view

Figure 8 - The Environment Variables tab in the Settings View

We’ll look at the three environment variables listed in Figure 6 as an example.AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY define the credentials used to access resources. S3_OBJECT_DISK_PATH is a simple string.

Adding an environment variable

Clicking the button at the top of the view opens the Environment Variable Details dialog:

Environment Variables Details dialog

Figure 9 - Using Value in the Environment Variable Details dialog

Field details

The fields in the dialog are:

Name

The name of the environment variable. The name can contain only letters [a-zA-Z], numbers [0-9], underscores [_], dots [.], and dashes [-], it can’t start with a digit, and it can’t be more than 50 characters long.

Value

The value of the variable. This field is mutually exclusive with the Value From field.

Value From

The name of a key inside a Kubernetes secret whose value should be used as the value for this environment variable. This field is mutually exclusive with the Value field.

In Figure 9 above, we’re using the Value field to define the string object_disks/demo/github as the value of the S3_OBJECT_DISK_PATH variable. Using theValue From field is slightly more complicated:

Environment Variables Details dialog

Figure 10 - Using Value From in the Environment Variable Details dialog

The format for Value From is secret-name/secret-key. In the example above, the value clickhouse-data/clickhouse_data_s3_access_key refers to the first key in this secret:

apiVersion: v1
kind: Secret
metadata:
  name: clickhouse-data
type: Opaque
data:
  clickhouse_data_s3_access_key: YXNkZmtqaGJhc2Rma2poYXNrYmpoYXNkZmtqaA==
  clickhouse_data_s3_secret_key: YXNncXd0c2Zid2FlcmFzbGtkamZobGFrc2poZg==

Continuing our example, when using S3 for data storage in ClickHouse, the configuration data must include the<access_key_id> and <secret_access_key> elements. With the contents of the keys inside the Kubernetes secret defined as environment variables, they can be used in ClickHouse configuration files via the from_env attribute:

<access_key_id from_env="AWS_ACCESS_KEY_ID"></access_key_id>
<secret_access_key from_env="AWS_SECRET_ACCESS_KEY"></secret_access_key>

With this technique, anyone with access to the cluster can use the values from the Kubernetes secret, but no one can get the actual values.

Editing an environment variable

To edit an environment variable, click the vertical dots icon next to the variable name and selectEdit from the menu. That takes you to the Environment Variables Details dialog seen in Figures 9 and 10 above.

Deleting an environment variable

Clicking the vertical dots icon next to the variable name and selecting Delete takes you to the Delete Environment Setting dialog:

Delete Environment Settings dialog

Figure 11 - The Delete Environment Setting dialog

Setting the Startup Mode

The Startup Mode menu lets you set how the cluster responds to changes in its settings:

Startup mode

Figure 12 - The startup modes for a ClickHouse cluster

The startup modes affect how Altinity Cloud Manager starts your ClickHouse cluster when you apply your new settings. The two modes are:

  • Direct mode - If something goes wrong during startup, ACM will retry the startup several times before giving up.
  • Troubleshooting mode - If something goes wrong during startup, ACM will not try to restart it if it fails. This is useful for debugging any problems with your cluster’s updated configuration.

Setting the Startup Time

Depending on the number of tables and the amount of data in your ClickHouse cluster, it may take longer than normal to start. That means it’s possible that the Kubernetes cluster hosting your ClickHouse cluster will delete and restart the pods needed to run ClickHouse before ClickHouse can start. For that reason, you can define a startup time, which is the number of seconds the Kubernetes cluster should wait for your ClickHouse cluster to start.

Startup mode

Figure 13 - The startup time parameter

If your ClickHouse cluster fails to start, you can check the ClickHouse Logs tab in your cluster’s Logs view for details. See the Cluster logs documentation for more information.

Click the button to set the startup time parameter. Note: this button only, uh, applies to the startup time parameter. It does not apply any changes you’ve made to settings or environment variables.

Resetting everything

The button restores all the standard settings to their default values. Any additional settings you have configured are deleted.

Configuring user settings profiles

Profiles allow you to give a name to a group of user settings, then easily apply those settings to a user.

Configuring a profile

Figure 14 - The Profiles view

Clicking the button takes you to the Profile Details dialog:

Defining a profile

Figure 15 - The Profile Details dialog

To edit the settings in the profile, click Edit Settings on the right-hand side of the list of profiles as shown in Figure 14 above. You’ll see a list of all the settings in this profile:

Defining a profile

Figure 16 - The Profile Settings list

Clicking the button takes you to the Profile Setting Details dialog:

Details of a profile setting

Figure 17 - The Profile Setting Details dialog

Clicking the down arrow displays a drop-down menu that lists all the settings. Selecting any setting in the list updates the dialog with a hint for that setting:

Details of a profile setting

Figure 18 - An different hint for a different setting

In the Profile list, clicking the vertical dots icon lets you either Edit or Delete a profile. Clicking Edit on the vertical dots menu takes you back to the Profile Details dialog shown in Figure 15 above. As you would expect, the delete button takes you to the Delete Profile dialog:

Delete a profile

Figure 19 - The Delete Profile dialog

Managing users

Depending on your account’s privileges, you may be able to add, edit, or delete users for the cluster. The Users panel lists all the users, their access, where their accounts are defined, and their profiles:

Edit a setting

Figure 20 - The Users View

In the display here, the lock icon next to the altinity and datadog user accounts mean that their access and privileges can’t be changed.

Clicking the button lets you add a new user:

Edit a setting

Figure 21 - The User Details dialog

Field details

Login and Password

The username and password. The passwords have to be at least 12 characters long, and they have to match. The OK button is disabled until the password fields are correct.

Databases

A comma-separated list of databases this user is allowed to access. If left blank, the user has access to all databases.

Profile

The user’s profile. See Configuring user settings profiles above for details on working with profiles.

Access Management

If selected, this user will be able to create, delete, and modify user accounts via SQL statements. This is useful for users who may not have access to the ACM UI.

As you would expect, clicking the vertical dots icon next to a username lets you edit or delete that user. The edit dialog is identical to the User Details dialog shown in Figure 21 above. The delete dialog is straightforward:

Delete a user

Figure 22 - The Delete User dialog

Configuring storage

Clicking the Storage menu item displays the list of volumes for your cluster:

Edit a setting

Figure 23 - The Volumes view

There are several buttons at the top of the display:

Edit a setting
  • MODIFY VOLUME - Lets you make changes to the selected volume.
  • ADD VOLUME - Lets you add another volume to your cluster.
  • FREE VOLUME - Moves all data from the selected volume to non-cordoned volumes in the cluster.
  • CORDON VOLUME - Changes the selected volume’s status to cordoned. A cordoned volume will not receive any new data; cordoning a volume is the first step towards removing it.
  • REMOVE VOLUME - Removes the selected volume from the cluster.
  • STORAGE POLICY - Lets you change the storage policy for the cluster. Available policies are JBOD (just a bunch of disks) or Tiered.

We’ll cover these options next.

Modifying a volume

Selecting a volume and clicking the button lets you change the properties of the selected volume. At a minimum, this allows you to change the type of disk and its size:

Edit a setting

Figure 24 - The Modify Volume dialog

Clicking the down arrow icon displays a menu of available disk types based on the cloud provider hosting your ClickHouse cluster. You can also change the size of the volume.

The dialog may have other options based on your cloud provider. For example, if your ClickHouse cluster is hosted on AWS, you can change the throughput of the volume:

Changing a volume's throughput

Figure 25 - Setting throughput for an AWS volume

Finally, if your storage policy is JBOD, you’ll get a warning message if the volume type you’ve selected is different from the other volumes in your cluster:

Warning message for JBOD volumes

Figure 26 - The JBOD warning message for mixed volume types

Using different types of volumes with a JBOD policy will give inconsistent performance, as the different volumes may not have the same capabilities.

Adding a volume

Clicking the button lets you add another volume to your cluster:

Adding a volume

Figure 27 - The Add New Volume dialog

Be aware that the size of each volume must be at least 350 GB to use multiple volumes. As with modifying a volume, if your storage policy is JBOD, you’ll get a warning message if the new volume is of a different type from the other volumes in your cluster:

Warning message for JBOD volumes

Figure 28 - The JBOD warning message for mixed volume types

Click SAVE to add the new volume. It will appear in the list of volumes.

Freeing a volume

Selecting a volume and clicking the button moves all data from the selected volume to non-cordoned volumes in the cluster. You must first cordon the volume for the FREE VOLUME button to be enabled. You’ll be asked to confirm that you want to free the volume:

Warning message for JBOD volumes

Figure 29 - The Free Volume dialog

When all data is moved off of this volume, the REMOVE VOLUME button will become active.

Cordoning a volume

Selecting a volume and clicking the button cordons the volume, which means no new data will be written to that volume. Clicking the button changes its text to UNCORDON VOLUME, which reverses the operation. A cordoned volume can be freed, which moves all data from the volume to non-cordoned volumes in the cluster.

Removing a volume

If the selected volume has no data, the button will be active. To remove a volume, you must cordon it, which means no new data will be written to it, then free the volume, which moves any data on the volume to non-cordoned volumes. As you would expect, clicking the button gives you a confirmation message:

Warning message for JBOD volumes

Figure 30 - The Remove Volume dialog

Click OK to remove the volume.

Setting the storage policy

Clicking the button lets you set the storage policy. There are two options: JBOD (just a bunch of disks) and Tiered (hot and cold data are stored separately). If the storage policy is JBOD, the dialog is simple:

Modifying the storage policy

Figure 31 - Modifying the storage policy - JBOD

On the other hand, Tiered storage is more complicated. Tiered storage is useful, for example, in a hot/cold architecture. You could keep your newest data in faster, more expensive storage, while moving older data to a slower, cheaper volume.

In ClickHouse you can define a Time-To-Live (TTL) period for data. The most common use of TTL is to delete old data that is no longer needed, but ClickHouse also lets you use TTL to move data from one volume to another when it reaches a certain age. To make the most of Tiered storage, you’ll need to define your own TTL properties.

See Manage Data with TTL in the ClickHouse documentation for all the details on TTL.

The dialog for Tiered storage has an additional field:

Modifying the storage policy - Tiered

Figure 32 - Modifying the storage policy - Tiered

The Move factor lets you define when the cluster should move data to alternate storage. The default value is 0.1, which means data is moved when the amount of available space on a volume is less than 10%. In the example here, the move factor is 0.2, which sets the threshold to 20%.

Changing the storage policy may result in data being moved between volumes. That can impact performance of running applications. For that reason, you’re asked to confirm any changes to the policy:

Confiruming a change to the storage policy

Figure 33 - Confirming a change to the storage policy

Click OK to change the storage policy.

Configuring connections

The Connection Configuration dialog makes it easy to configure your ClickHouse cluster’s connections to the world:

Configuring connections

Figure 34 - The Connection Configuration dialog

Field details

Endpoint

This is the endpoint of your cluster. The value here is your cluster name combined with an Altinity domain.

Alternate Endpoints

Defining an Alternate Endpoint

Figure 35 - Defining an Alternate Endpoint

You can define alternate endpoints for your cluster. The name of the alternate endpoint can contain lowercase letters, numbers, and hyphens. It must start with a letter, and it cannot end with a hyphen.

Contact Altinity support to set up an alternate endpoint.

VPC Endpoint Enabled

Contact Altinity support to set up a VPC endpoint.

IP restrictions

If enabled, only ClickHouse applications or clients coming from specific IP addresses are allowed to connect to the cluster. You can add other addresses to the list, including ranges of IP addresses in CIDR format.

Disabling IP restrictions means that any ClickHouse application or client can connect to your ClickHouse cluster from any IP address.

NOTE: This restriction only applies to ClickHouse applications or clients. Anyone with the proper credentials can access the Altinity Cloud Manager UI from any IP address.

Protocols

Port 9440 enables the ClickHouse binary protocl, and port 8443 enables HTTP connections.

Datadog Integration

You can use Datadog to monitor your ClickHouse cluster. The Datadog options are only enabled if your cluster’s environment is enabled for Datadog support. See the section Enabling Datadog at the environment level for the details. Be aware that you must have the appropriate privileges to edit an environment’s settings, so you may need to contact your administrator.

Zone Awareness

When Zone Awareness is enabled, Altinity.Cloud keeps traffic between client connections and your ClickHouse cluster in a single availability zone whenever possible. This allows you to avoid cross-zone hops.

However, if all of your client connections come from a single zone, this feature will route all requests to a single ClickHouse node. In that case, turning Zone Awareness off will ensure that your load balancer will distribute requests across all the nodes in the cluster.

Setting an activity schedule

The Activity Schedule settings let you control when a ClickHouse cluster will run as well as the kinds of nodes the ClickHouse cluster will use. Altinity.Cloud does not bill you for compute resources or support for non-running clusters, so you can cut your costs by stopping ClickHouse clusters that don’t need to run constantly. (Note that these cost savings do not apply to storage and backups.) You can also cut your costs by scaling your ClickHouse clusters down to smaller, cheaper nodes during non-peak hours.

There are four items at the top of the dialog:

Activity scheduling options
  • ALWAYS ON - the cluster is always running. This is the default.
  • STOP WHEN INACTIVE - the cluster stops running after some number of hours of inactivity.
  • STOP ON SCHEDULE - the cluster runs only on certain days of the week or at certain times of the day.
  • RESCALE ON SCHEDULE - the cluster is always running, but it scales up or down to different node types on certain days of the week or at certain times of the day.

We’ll look at the four schedule options now.

ALWAYS ON

Used for mission-critical ClickHouse clusters that must run 24/7.

Figure 36 – The ALWAYS ON Activity Schedule setting

Click ALWAYS ON and click CONFIRM to save. Note that with this setting, Altinity.Cloud will not trigger any Stop or Resume operations automatically. If you stop the cluster, you’ll have to resume or restart it yourself; setting the schedule to ALWAYS ON will not automatically resume or restart it.

STOP WHEN INACTIVE

Used to stop ClickHouse clusters after a set number of hours of inactivity. For non-running clusters, Altinity.Cloud does not bill you for compute resources or support, although charges for storage and backups continue.

Figure 37 – The STOP WHEN INACTIVE Activity Schedule setting

Click STOP WhEN INACTIVE, then select the number of hours by typing in a value or by using the up and down arrows. Click CONFIRM to save. A clock icon will appear next to the cluster name in the Clusters Dashboard.

NOTE: If a cluster is stopped for more than 30 days, you’ll get a warning message suggesting that you delete the cluster to avoid storage and backup charges for the unused cluster:

Figure 38 - Unused cluster warning message

STOP ON SCHEDULE

Sets the days of the week your ClickHouse clusters will run, with the option of defining From and To times that your clusters will run for each day. (Times are expressed in GMT and are displayed in 12- or 24-hour format depending on your machine’s settings.)

The schedule for credentials below defines the following settings:

  • On Tuesday and Thursday, the cluster runs from 08:00 to 17:00 GMT, and is stopped the rest of the day.
  • On Monday, Wednesday, and Friday, the cluster runs all day.
  • On Saturday and Sunday, the cluster is stopped.

Figure 39 - The STOP ON SCHEDULE Activity Schedule setting

Click CONFIRM to save. A clock icon will appear next to the cluster name in the Clusters Dashboard.

RESCALE ON SCHEDULE

With this option your ClickHouse cluster is always running, but you can define the days of the week your cluster will run on larger, more powerful nodes, with the cluster rescaling to smaller, cheaper nodes the rest of the time. You also have the option of defining From and To times when your clusters will use the larger nodes. (Times are expressed in GMT and are displayed in 12- or 24-hour format depending on your machine’s settings.) By default the cluster runs on the larger nodes all the time.

NOTE: These settings do not start, resume, or restart the cluster, they merely define peak days or hours (Active state) when the cluster should run on larger nodes. When the cluster is in Inactive state, it’s still running, just on smaller nodes.

In the example here, the Active node type is m6i.xlarge, and the Inactive Node Type is m5.large. The Active node type is set when you create the ClickHouse cluster. To change the Inactive node type, click the down arrow icon to see the list of available node types.

The schedule for credentials below defines the following settings:

  • On Monday, the cluster starts the day in Inactive state (running on node type m5.large), scaling up to Active state (running on node type m6i.xlarge) at 08:00 GMT and continuing through the rest of the day.
  • On Tuesday, Wednesday, and Thursday, the cluster is in Active state (running on node type m6i.xlarge) all day.
  • On Friday, the cluster starts the day at midnight in Active state (running on node type m6i.xlarge), scaling down to Inactive state (running on node type m5.large) at 17:00 GMT and continuing through the rest of the day.
  • On Saturday, the cluster is in Active state (running on node type m6i.xlarge) from 09:00 GMT to 17:00 GMT, and is in Inactive state (running on node type m5.large) all other times of the day.
  • On Sunday, the cluster is in Inactive state (running on node type m5.large) all day.

Figure 40 – The RESCALE ON SCHEDULE Activity Schedule setting

NOTE: If you manually rescale the cluster, the Activity Schedule is reset to ALWAYS ON. Rescaling the cluster changes the node type it’s running on, so if you want to use the RESCALE ON SCHEDULE schedule you need to redefine the Active and Inactive node types and the times they should be used. See Rescaling a cluster for more details.

Click CONFIRM to save.

Configuring backups

You can define a schedule for creating backups of your cluster. The Backup Settings dialog lists the current schedule:

Configuring Backup Settings

Figure 41 - The Backup Settings dialog

There are five options to define the Period when backups should occur:

  • Monthly - Define the day of the month
  • Weekly - Define the day of the week
  • Daily - Define the time of day
  • Every 6 hours - Backups occur every six hours
  • Every hour - Backups occur every hour.

NOTE: All times are expressed in GMT and are displayed in 12- or 24-hour format depending on your machine’s settings.

In addition to defining the period, you can also define the number of backups to keep. The default is seven.

The button lets you define multiple schedules. For example, if you only want backups to occur on Friday and Saturday, create two Weekly schedules, one for Friday and one for Saturday. You can define up to three schedules.