Storage Hardening
ClickHouse data is ultimately stored on file systems. Keeping that data protected when it is being used or “at rest” is necessary to prevent unauthorized entities from accessing your organization’s private information.
Hardening stored ClickHouse data is split into the following catagories:
-
Host-Level Security
-
Volume Level Encryption
-
Column Level Encryption
-
Log File Protection
-
IMPORTANT NOTE: Configuration settings can be stored in the default
/etc/clickhouse-server/config.xml
file. However, this file can be overwritten during vendor upgrades. To preserve configuration settings it is recommended to store them in/etc/clickhouse-server/config.d
as separate XML files with the same root element, typically<yandex>
. For this guide, we will only refer to the configuration files in/etc/clickhouse-server/config.d
for configuration settings.
Host-Level Security
The file level security for the files that ClickHouse uses to run should be restricted as much as possible.
- ClickHouse does not require
root
access to the file system, and runs by default as the userclickhouse
. - The following directories should be restricted to the minimum number of users:
/etc/clickhouse-server
: Used for ClickHouse settings and account credentials created by default./var/lib/clickhouse
: Used for ClickHouse data and new credentials./var/log/clickhouse-server
: Log files that may display privileged information through queries. See Log File Protection for more information.
Volume Level Encryption
Encrypting data on the file system prevents unauthorized users who may have gained access to the file system that your ClickHouse database is stored on from being able to access the data itself. Depending on your environment, different encryption options may be required.
Cloud Storage
If your ClickHouse database is stored in a cloud service such as AWS or Azure, verify that the cloud supports encrypting the volume. For example, Amazon AWS provides a method to encrypt new Amazon EBS volumes by default.
The Altinity.Cloud service provides the ability to set the Volume Type to gp2-encrypted. For more details, see the Altinity.Cloud Cluster Settings
Local Storage
Organizations that host ClickHouse clusters on their own managed systems, LUKS is a recommended solution. Instructions for Linux distributions including Red Hat and Ubuntu are available. Check with the distribution your organization for instructions on how to encrypt those volumes.
Kubernetes Encryption
If your ClickHouse cluster is managed by Kubernetes, the StorageClass used may be encrypted. For more information, see the Kubernetes Storage Class documentation.
Column Level Encryption
Organizations running ClickHouse 20.11 or later can encrypt individual columns with AES functions. For full information, see the ClickHouse.tech Encryption functions documentation.
Applications are responsible for their own keys. Before enabling column level encryption, test to verify that encryption does not negatively impact performance.
The following functions are available:
Function | MySQL AES Compatible |
---|---|
encrypt(mode, plaintext, key, [iv, aad]) | |
decrypt(mode, ciphertext, key, [iv, aad]) | |
aes_encrypt_mysql(mode, plaintext, key, [iv]) | * |
aes_decrypt_mysql(mode, ciphertext, key, [iv]) | * |
Encryption function arguments:
Argument | Description | Type |
---|---|---|
mode | Encryption mode. | String |
plaintext | Text thats need to be encrypted. | String |
key | Encryption key. | String |
iv | Initialization vector. Required for -gcm modes, optional for others. | String |
aad | Additional authenticated data. It isn’t encrypted, but it affects decryption. Works only in -gcm modes, for others would throw an exception | String |
Column Encryption Examples
This example displays how to encrypt information using a hashed key.
- Takes a hex value, unhexes it and stores it as
key
. - Select the value and encrypt it with the
key
, then displays the encrypted value.
WITH unhex('658bb26de6f8a069a3520293a572078f') AS key
SELECT hex(encrypt('aes-128-cbc', 'Hello world', key)) AS encrypted
┌─encrypted────────────────────────┐
│ 46924AC12F4915F2EEF3170B81A1167E │
└──────────────────────────────────┘
This shows how to decrypt encrypted data:
- Takes a hex value, unhexes it and stores it as
key
. - Decrypts the selected value with the
key
as text.
WITH unhex('658bb26de6f8a069a3520293a572078f') AS key SELECT decrypt('aes-128-cbc',
unhex('46924AC12F4915F2EEF3170B81A1167E'), key) AS plaintext
┌─plaintext───┐
│ Hello world │
└─────────────┘
Log File Protection
The great thing about log files is they show what happened. The problem is when they show what happened, like the encryption key used to encrypt or decrypt data:
2021.01.26 19:11:23.526691 [ 1652 ] {4e196dfa-dd65-4cba-983b-d6bb2c3df7c8}
<Debug> executeQuery: (from [::ffff:127.0.0.1]:54536, using production
parser) WITH unhex('658bb26de6f8a069a3520293a572078f') AS key SELECT
decrypt(???), key) AS plaintext
These queries can be hidden through query masking rules, applying regular expressions to replace commands as required. For more information, see the ClickHouse.tech Server Settings documentation.
To prevent certain queries from appearing in log files or to hide sensitive information:
- Update the configuration files, located by default in
/etc/clickhouse-server/config.d
. - Add the element
query_masking_rules
. - Set each
rule
with the following:name
: The name of the rule.regexp
: The regular expression to search for.replace
: The replacement value that matches the rule’s regular expression.
For example, the following will hide encryption and decryption functions in the log file:
<query_masking_rules>
<rule>
<name>hide encrypt/decrypt arguments</name>
<regexp>
((?:aes_)?(?:encrypt|decrypt)(?:_mysql)?)\s*\(\s*(?:'(?:\\'|.)+'|.*?)\s*\)
</regexp>
<!-- or more secure, but also more invasive:
(aes_\w+)\s*\(.*\)
-->
<replace>\1(???)</replace>
</rule>
</query_masking_rules>
Feedback
Was this page helpful?
Glad to hear it!
Sorry to hear that. We'll track this issue and see how we can improve.