Working with Object Disks

Using object storage and block storage

ClickHouse® works best with block storage, but in some cases it makes sense to add object storage as well. (Keeping rarely accessed data forever, for example.) ClickHouse itself supports S3, GCS, and Azure Blob storage.

You can use the Altinity Cloud Manager to enable object disks in Altinity.Cloud ClickHouse clusters.

Configuring an S3 bucket for Altinity.Cloud

Once you’ve defined an S3 bucket, there are several configuration tasks you may need to do, including defining policies, setting up credentials, and configuring versioning and soft deletes. We’ll cover those here.

To get started, go to the Cluster view and click the Settings item on the CONFIGURE menu:

Defining AWS credentials in environment variables

To work with S3 buckets in your AWS account, you’ll need to define your credentials for those buckets. On the Settings page in the ACM, click the Environment Variables tab at the top of the page and define the variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY:

Figure 1 - Environment variables for S3

See the documentation on setting environment variables for the details.

Defining the storage configuration

With your AWS credentials defined, you need to add a configuration file to the config.d directory. Click the Server Settings tab at the top of the page and add a new Setting. You’ll see this dialog:

Figure 2 - Configuring an S3 bucket for object disks

In Figure 2 above, the configuration file is named s3_disk.xml, the region is us-east-1 and the bucket name is object-disk-01. In general, the contents should look something like this:

<clickhouse>
  <storage_configuration>
    <disks>
      <s3>
        <type>s3</type>
        <endpoint>http://s3.REGION.amazonaws.com/BUCKET/clickhouse/{cluster}/{replica}</endpoint>
        <region>REGION</region>
        <access_key_id  
          from_env="AWS_ACCESS_KEY_ID"/>
        <secret_access_key 
          from_env="AWS_SECRET_ACCESS_KEY"/>
        <skip_access_check>true</skip_access_check>
      </s3>
      <s3_cache>
        <type>cache</type>
        <disk>s3</disk>
        <path>/var/lib/clickhouse/disks/s3_cache/</path>
        <max_size>20Gi</max_size>
        <enable_filesystem_cache_log>1</enable_filesystem_cache_log>
        <cache_on_write_operations>1</cache_on_write_operations> 
      </s3_cache>
    </disks>
    <policies>
      <s3>
        <volumes>
          <s3>
            <disk>s3_cache</disk>
          </s3>
        </volumes>
      </s3>
      <tiered>
        <volumes>
          <default>
            <disk>default</disk>
          </default>
          <s3>
            <disk>s3_cache</disk>
            <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert>
            <volume_priority>2</volume_priority>
          </s3>
        </volumes>
        <move_factor>0.001</move_factor>
      </tiered>
    </policies>
  </storage_configuration>
  <!-- From 24.3 onwards -->
  <merge_tree>
    <force_read_through_cache_for_merges>1</force_read_through_cache_for_merges>
  </merge_tree>
</clickhouse>

Obviously you’ll need to replace REGION with the appropriate region in the endpoint and region elements and specify the BUCKET name in the endpoint element. ClickHouse will insert the correct values for {cluster} and {replica}.

Other notes:

Setting enable_filesystem_cache_log to 1 creates a log system table for all cache operations. This is a good practice to monitor cache performance, but it does not enable write-through caching. (Read on…)
The feature flag for write-through caching is turned on with cache_on_write_operations set to 1, but this does not enable write-through caching, it merely makes it possible. To actually enable write-through caching, set the enable_filesystem_cache_on_write_operations and enable_filesystem_cache_log flags in the default profile:

<profiles>
  <default>
    <!-- To enable write-through caching for INSERTs/MERGEs, 
         put these elements in the default profile: -->
    <!--
    <enable_filesystem_cache_on_write_operations>1</enable_filesystem_cache_on_write_operations>
    <enable_filesystem_cache_log>1<enable_filesystem_cache_log> 
    -->
  </default>
</profiles>

You should always explicitly set the volume_priority in your storage policies. Without this value, ClickHouse assigns priorities based on lexicographical order. In the example above, default will come before s3_cache. That’s what we want, but if we used ssd instead of default, the system’s priorities would be the opposite of what we want.
Starting with version 24.3, the force_read_through_cache_for_merges setting has been moved to the <merge_tree> element. You need to enable this setting to enable write-through cache for merges.

See the documentation on setting a value in a file in the config.d directory for all the details.

Checking your configuration

After defining the environment variables and the configuration file, the ACM will create the following in ClickHouse:

A disk named s3 in a particular S3 bucket
A disk named s3_cache that is used as a write-through cache for the s3 disk
A storage policy named s3 that says the disk named s3 uses s3_cache
A storage policy named tiered that uses the default disk and s3_cache. The perform_ttl_move_on_insert and move_factor elements define how tiered storage works. See the ClickHouse documentation on storage policies for all the details on those values.

Note that in the example above, the tiered storage policy has to be applied to each table explicitly. As an alternative, you can make this the default storage policy:

<policies>
  <default>
    <volumes>
      <!-- 'default' volume contains 1 or more disk and will be merged from a standard config –>
      <s3>
        <disk>s3_cached</disk>
        <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert>
      </s3>
    </volumes>
    <move_factor>0.001</move_factor>
  </default>
</policies>

After your ClickHouse server is restarted, you can validate your configuration with these queries against the system database:

SELECT * FROM system.disks
SELECT * FROM system.storage_policies

Using an object disk

Once an object storage disk is configured, you may use it to set up TTLs.

First, make sure that table storage policy contains an S3 disk. If the tiered policy has been added, then existing tables that need to be on S3 need to be modified:

ALTER TABLE my_table MODIFY SETTING storage_policy='tiered'

After that, TTL rules can be added to the table.

ALTER TABLE my_table ADD TTL toDate(event_date) + 30 day TO DISK 's3'

If the table is big, it may trigger a lot of work since ClickHouse will start to evaluate what needs to be moved and perform actual moves. The Altinity Knowledge Base has an article on MODIFY / ADD TTL logic and how to control this behavior.

You may also move a single table partition to S3 without adding a TTL:

ALTER TABLE my_table MOVE PARTITION my_partition TO DISK 's3'

You can validate how table data is stored with this query:

SELECT disk_name, sum(bytes) FROM system.parts WHERE active GROUP BY 1

If you have a separately managed S3 bucket, you can backup and restore object disks to S3 directly. See the Altinity clickhouse-backup repo for examples on backing up object disks to s3 with s3:copyobject and how to restore object disks to s3 with s3:copyobject.

Finally, see the ClickHouse documentation on the S3 Backed MergeTree for more details.