Altinity Stable Build for ClickHouse 21.1.7.1 Release Notes
Several months ago we certified ClickHouse 20.8 as Altinity Stable. Since then we have worked on newer releases and run them in-house. We completed quite a few new features, and even more have been added by community contributors. It is always difficult to make a stop and pick a community release for certification. Newer releases promise new features, but older ones are easier to upgrade to. We were running ClickHouse 21.1 to power our public datasets instance at Altinity.Cloud for two months, and testing it in our environments. We are now confident to certify 21.1 as an Altinity Stable release.
This release is a significant step forward since the previous Altinity Stable release. It includes 1462 pull requests from 208 contributors with many new features and performance improvements! Please look below for detailed release notes.
Major new features since the previous stable release 20.8.x
A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:
- Security features:
- LDAP external users directory, see an article in out blog a)
- AES Encryption functions, see an article in our blog for more detail a)
- Migration from OpenSSL to BoringSSL library
- Support SNI in https connections to remote resources
- Support SNI in ClickHouse server TLS endpoint a)
- Security context propagation in distributed queries
- MergeTree features:
- New TTL extension:
TTL RECOMPRESS
. ALTER UPDATE/DELETE IN PARTITION
for replicated MergeTree tables a)DETACH TABLE/VIEW PERMANENTLY <sup>a)</sup>
[OPTIMIZE DEDUPLICATE BY](https://clickhouse.tech/docs/en/sql-reference/statements/optimize/#by-expression)
– deduplicate MergeTree tables by a subset of columns a)- SimpleAggregateFunction in SummingMergeTree
- Option to disable merges for a cold storage in tiered storage configuration a)
- New TTL extension:
- Integrations:
- gRPC protocol
- zstd and xz compression for file-based engines
- EmbeddedRocksDB engine
- SQL compatibility:
UNION DISTINCT
(previously onlyUNION ALL
was supported). The default can be altered byunion_default_mode
setting.- Improved CTE compatibility
REPLACE TABLE
andCREATE OR REPLACE TABLE
DDL statements for Atomic database engine.
- Other:
- New Map data type (experimental)
- OpenTelemetry support (experimental)
- Column transformers in SELECT statements
As usual with ClickHouse, there are many performance and operational improvements in different server components.
a) - contributed by Altinity developers.
Backward Incompatible Changes
The following changes are backward incompatible and require user attention during an upgrade:
-
Atomic database engine is enabled by default. It does not affect existing databases but new databases will be created with
Engine = Atomic
. The engine can not be modified for the database once created. Database Atomic has been used for system tables since 20.5, and it is a good feature in the long term. We recommend disabling it for now, however, especially if you use some backup tools, including ClickHouse Backup 0.6.4 or earlier. The data layout on the storage has been changed. In order to disable it by default, add a following configuration section for a default profile:<yandex> <profiles> <default> <default_database_engine>Ordinary</default_database_engine> </default> </profiles> </yandex>
-
toUUID(N) is no longer supported. If there is a DEFAULT column with this expression ClickHouse won’t start.
-
Following functions where removed:
sumburConsistentHash, timeSeriesGroupSum, timeSeriesGroupRateSum
. -
avg
andavgWeighted
functions now always return Float64. In previous versions they returned Decimal for Decimal arguments. -
Accept user settings related to file formats (e.g.
format_csv_delimiter
) in the SETTINGS clause when creating a table that uses File engine, and use these settings in all INSERTs and SELECTs. Session level settings are ignored in this case.
Upgrade Notes
There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.
- Replication protocol has been changed in 20.10 in order to improve reliability of TTL merges. Replication between versions prior to 20.10 and 20.10+ is incompatible if ReplicatedMergeTree tables with TTL are used. See https://github.com/ClickHouse/ClickHouse/pull/14490 for more information.
- For a safe upgrade all replicas should be upgraded at once.
- Alternatively,
SYSTEM STOP TTL MERGES
should be used during the upgrade.
Other Important Changes
Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:
- Insert quorum behaviour has been changed.
insert_quorum_parallel
is enabled by default. It breaks sequential consistency and may have other side effects. We recommend disabling this feature if you are using quorum inserts on previous versions. optimize_on_insert
is enabled by default. This is a new feature that applies a logic of Replacing, Summing, Collapsing and AggregatingMergeTree on the inserted block. Unfortunately, it still has some issues so we recommend disabling this after upgrading.use_compact_format_in_distributed_parts_names
is enabled by default.input_format_null_as_default
is enabled by default.- Background fetches are now limited by
background_fetches_pool_size
setting. The default value is 3 that may be low in some cases. In previous versions the common background pool has been used for merges and fetches with the default size 16. - Compact MergeTree parts are enabled by default for parts below 10MB of size uncompressed. See
min_bytes_for_wide_part
setting.
Known issues in 21.1.7.1
Development team continues to improve the quality of the 21.1 release. Following issues still exist in the 21.1.7.1 version and may affect ClickHouse operation. Please inspect them carefully in order to decide if those are applicable to your applications:
- “Unexpected end of stream” errors may happen when loading text files (TSV and CSV formats) without line end at the end of file: https://github.com/ClickHouse/ClickHouse/issues/20244
- ‘/etc/init.d/clickhouse-server restart’ cannot restart ClickHouse server if shutdown did not complete in 3 seconds timeout:
https://github.com/ClickHouse/ClickHouse/issues/20214
The timeout can be increased withshutdown_wait_unfinished
server setting - Distributed queries may fail sometimes with ‘Unknown packet n from server’ error message. Disable
async_socket_for_remote
setting if you experience this issue: https://github.com/ClickHouse/ClickHouse/issues/21588
ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from the Altinity Stable Repository.
Please contact us at info@altinity.com if you experience any issues with the upgrade.
Appendix
New data types
DateTime32
(alias toDateTime
)Map
(Experimental)
New formats
JSONCompactStrings
JSONCompactStringsEachRow
JSONCompactStringsEachRowWithNamesAndTypes
JSONStrings
JSONStringsEachRow
JSONStringsEachRowWithProgress
LineAsString
ProtobufSingle
RawBLOB
New functions
- DateTime functions:
fromModifiedJulianDay, toModifiedJulianDay
fromModifiedJulianDayOrNull, toModifiedJulianDayOrNull
parseDateTime32BestEffort, parseDateTime32BestEffortOrNull, parseDateTime32BestEffortOrZero, toDateTime32
- String functions:
countMatches, countMatchesCaseInsensitive
countSubstrings, countSubstringsCaseInsensitive, countSubstringsCaseInsensitiveUTF8
isIPv4String, isIPv6String
- Functions for semi-duplicate search and strings proximity:
bitHammingDistance, tupleHammingDistance
ngramMinHash*, ngramSimHash*
- Encoding/Formatting:
encodeXMLComponent
formatReadableQuantity
formatReadableTimeDelta
- Array functions:
arrayAvg, arrayMax, arrayMin
mapPopulateSeries
- Machine-learning and statistics:
mannWhitneyUTest
rankCorr
hypot
studentTTest, welchTTest
- Encryption:
aes_decrypt_mysql, aes_encrypt_mysql
dencrypt, encrypt
- URL functions
cutToFirstSignificantSubdomainCustom, cutToFirstSignificantSubdomainCustomWithWWW, cutToFirstSignificantSubdomainWithWWW
firstSignificantSubdomainCustom
- Aggregate functions
- New combinator
-SimpleState
- New combinator
- Functions to work with new Map data type:
map
- Math/Statistics:
acosh, asinh, atan2, atanh, cosh, sinh
log1p
- Type related:
accurateCast
accurateCastOrNull
byteSize
reinterpretAsUUID
toUUIDOrNull
toUUIDOrZero
- Other:
farmFingerprint64
- Internal ClickHouse:
errorCodeToName
logTrace
tid
tcpPort
New table functions
null
view
New table engines
EmbeddedRocksDB
New metrics and events
system.metrics:
BackgroundFetchesPoolTask
BackgroundMessageBrokerSchedulePoolTask
MaxDDLEntryID
TablesToDropQueueSize
- REMOVED
MemoryTrackingForMerges
- REMOVED
MemoryTrackingInBackgroundBufferFlushSchedulePool
- REMOVED
MemoryTrackingInBackgroundDistributedSchedulePool
- REMOVED
MemoryTrackingInBackgroundMoveProcessingPool
- REMOVED
MemoryTrackingInBackgroundProcessingPool
- REMOVED
MemoryTrackingInBackgroundSchedulePool
system.asynchronous_metrics:
HTTPThreads
InterserverThreads
MySQLThreads
TCPThreads
TotalBytesOfMergeTreeTables
TotalPartsOfMergeTreeTables
TotalRowsOfMergeTreeTables
system.events:
RWLockReadersWaitMilliseconds
- REMOVED
CreatedWriteBufferOrdinary
New system tables
errors
replicated_fetches
replicated_merge_tree_settings
New columns in system tables
grants.access_type
merges.merge_type, merge_algorithm
parts.uuid, default_compression_codec, recompression_ttl_info.expression, recompression_ttl_info.min, recompression_ttl_info.max
privileges.privilege, parent_group
processes.forwarded_for
query_log.event_time_microseconds, query_start_time_microseconds, normalized_query_hash, query_kind, databases, tables, columns, forwarded_for
query_thread_log.event_time_microseconds, query_start_time_microseconds, normalized_query_hash, current_database, forwarded_for
quotas.keys
replication_queue.merge_type
storage_policies.prefer_not_to_merge
table_engines.supports_parallel_insert
trace_log.event_time_microseconds
user_directories.path, readonly, params
system.merge_tree_settings added/changed
Name | Old value | New value | Description |
---|---|---|---|
assign_part_uuids |
0 | Generate UUIDs for parts. Before enabling check that all replicas support the new format. | |
execute_merges_on_single_replica_time_threshold |
0 | When greater than zero only a single replica starts the merge immediately, others wait up to that amount of time to download the result instead of doing merges locally. If the chosen replica doesn’t finish the merge during that amount of time, fallback to standard behavior happens. | |
fsync_after_insert |
0 | Do fsync for every inserted part. Significantly decreases performance of inserts, not recommended to use with wide parts. | |
fsync_part_directory |
0 | Do fsync for part directory after all part operations (writes, renames, etc.). | |
in_memory_parts_insert_sync |
0 | If true insert of part with in-memory format will wait for fsync of WAL | |
max_compress_block_size |
0 | Compress the pending uncompressed data in a buffer if its size is larger or equal than the specified threshold. Block of data will be compressed even if the current granule is not finished. If this setting is not set, the corresponding global setting is used. | |
max_number_of_merges_with_ttl_in_pool |
2 | When there is more than a specified number of merges with TTL entries in the pool, do not assign a new merge with TTL. This is to leave free threads for regular merges and avoid “Too many parts” | |
max_partitions_to_read |
-1 | Limit the max number of partitions that can be accessed in one query. <= 0 means unlimited. This setting is the default that can be overridden by the query-level setting with the same name. | |
max_replicated_merges_with_ttl_in_queue |
1 | How many tasks of merging parts with TTL are allowed simultaneously in the ReplicatedMergeTree queue. | |
merge_with_recompression_ttl_timeout |
14400 | Minimal time in seconds, when merge with recompression TTL can be repeated. | |
merge_with_ttl_timeout |
86400 | 14400 | Minimal time in seconds, when merge with delete TTL can be repeated. |
min_bytes_for_wide_part |
0 | 10485760 | Minimal uncompressed size in bytes to create part in wide format instead of compact |
min_compress_block_size |
0 | When granule is written, compress the data in a buffer if the size of pending uncompressed data is larger or equal than the specified threshold. If this setting is not set, the corresponding global setting is used. | |
min_compressed_bytes_to_fsync_after_fetch |
0 | Minimal number of compressed bytes to do fsync for part after fetch (0 - disabled) | |
min_compressed_bytes_to_fsync_after_merge |
0 | Minimal number of compressed bytes to do fsync for part after merge (0 - disabled) | |
min_rows_to_fsync_after_merge |
0 | Minimal number of rows to do fsync for part after merge (0 - disabled) | |
remove_empty_parts |
1 | Remove empty parts after they were pruned by TTL, mutation, or collapsing merge algorithm | |
try_fetch_recompressed_part_timeout |
7200 | Recompression works slowly in most cases. We don’t start a merge with recompression until this timeout and try to fetch the recompressed part from the replica which assigned this merge with recompression. | |
write_ahead_log_bytes_to_fsync |
104857600 | Amount of bytes, accumulated in WAL to do fsync. | |
write_ahead_log_interval_ms_to_fsync |
100 | Interval in milliseconds after which fsync for WAL is being done. |
system.settings added/changed
Name | Old value | New value | Description |
---|---|---|---|
aggregate_functions_null_for_empty |
0 | Rewrite all aggregate functions in a query, adding -OrNull suffix to them | |
allow_experimental_cross_to_join_conversion |
1 | ||
allow_experimental_data_skipping_indices |
1 | ||
allow_experimental_low_cardinality_type |
1 | ||
allow_experimental_map_type |
0 | Allow data type Map | |
allow_experimental_multiple_joins_emulation |
1 | ||
allow_experimental_window_functions |
0 | Allow experimental window functions | |
asterisk_include_alias_columns |
0 | Include ALIAS columns for wildcard query | |
asterisk_include_materialized_columns |
0 | Include MATERIALIZED columns for wildcard query | |
async_socket_for_remote |
1 | Asynchronously read from socket executing remote query | |
background_fetches_pool_size |
3 | Number of threads performing background fetches for replicated tables. Only has meaning at server startup. | |
background_message_broker_schedule_pool_size |
16 | Number of threads performing background tasks for message streaming. Only has meaning at server startup. | |
compile |
0 | ||
database_atomic_wait_for_drop_and_detach_synchronously |
0 | When executing DROP or DETACH TABLE in Atomic database, wait for table data to be finally dropped or detached. | |
date_time_output_format |
simple | Method to write DateTime to text output. Possible values: ‘simple’, ‘iso’, ‘unix_timestamp’. | |
default_database_engine |
Ordinary | Atomic | Default database engine. |
do_not_merge_across_partitions_select_final |
0 | Merge parts only in one partition in select final | |
enable_global_with_statement |
0 | Propagate WITH statements to UNION queries and all subqueries | |
experimental_use_processors |
1 | ||
force_data_skipping_indices |
Comma separated list of strings or literals with the name of the data skipping indices that should be used during query execution, otherwise an exception will be thrown. | ||
force_optimize_skip_unused_shards_no_nested |
0 | ||
format_regexp_escaping_rule |
Escaped | Raw | Field escaping rule (for Regexp format) |
input_format_csv_arrays_as_nested_csv |
0 | When reading Array from CSV, expect that its elements were serialized in nested CSV and then put into string. Example: “[““Hello””, ““world””, ““42"””” TV""]". Braces around an array can be omitted. | |
input_format_csv_enum_as_number |
0 | Treat inserted enum values in CSV formats as enum indices | |
input_format_null_as_default |
0 | 1 | For text input formats initialize null fields with default values if data type of this field is not nullable |
input_format_tsv_enum_as_number |
0 | Treat inserted enum values in TSV formats as enum indices | |
insert_distributed_one_random_shard |
0 | If setting is enabled, inserting into distributed table will choose a random shard to write when there is no sharding key | |
insert_quorum_parallel |
1 | For quorum INSERT queries - enable to make parallel inserts without linearizability | |
limit |
0 | Limit on read rows from the most ’end’ result for select query, default 0 means no limit length | |
load_balancing_first_offset |
0 | Which replica to preferably send a query when FIRST_OR_RANDOM load balancing strategy is used. | |
log_queries_min_query_duration_ms |
0 | Minimal time for the query to run, to get to the query_log/query_thread_log. | |
mark_cache_min_lifetime |
0 | ||
max_bytes_to_read_leaf |
0 | Limit on read bytes (after decompression) on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node. | |
max_concurrent_queries_for_all_users |
0 | The maximum number of concurrent requests for all users. | |
max_partitions_to_read |
-1 | Limit the max number of partitions that can be accessed in one query. <= 0 means unlimited. | |
max_rows_to_read_leaf |
0 | Limit on read rows on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node. | |
merge_tree_uniform_read_distribution |
1 | ||
min_count_to_compile |
0 | ||
multiple_joins_rewriter_version |
2 | 0 | Obsolete setting, does nothing. Will be removed after 2021-03-31 |
mysql_datatypes_support_level |
Which MySQL types should be converted to corresponding ClickHouse types (rather than being represented as String). Can be empty or any combination of ‘decimal’ or ‘datetime64’. When empty MySQL’s DECIMAL and DATETIME/TIMESTAMP with non-zero precision are seen as String on ClickHouse’s side. | ||
offset |
0 | Offset on read rows from the most ’end’ result for select query | |
opentelemetry_start_trace_probability |
0 | Probability to start an OpenTelemetry trace for an incoming query. | |
optimize_move_functions_out_of_any |
1 | 0 | Move functions out of aggregate functions ‘any’, ‘anyLast’. |
optimize_on_insert |
1 | Do the same transformation for inserted block of data as if merge was done on this block. | |
optimize_skip_merged_partitions |
0 | Skip partitions with one part with level > 0 in optimize final | |
output_format_json_array_of_rows |
0 | Output a JSON array of all rows in JSONEachRow(Compact) format. | |
output_format_json_named_tuples_as_objects |
0 | Serialize named tuple columns as JSON objects. | |
output_format_parallel_formatting |
1 | Enable parallel formatting for some data formats. | |
output_format_pretty_row_numbers |
0 | Add row numbers before each row for pretty output format | |
output_format_tsv_null_representation |
Custom NULL representation in TSV format | ||
partial_merge_join |
0 | ||
read_backoff_min_concurrency |
1 | Settings to try keeping the minimal number of threads in case of slow reads. | |
read_overflow_mode_leaf |
throw | What to do when the leaf limit is exceeded. | |
remerge_sort_lowered_memory_bytes_ratio |
2 | If memory usage after remerge does not reduced by this ratio, remerge will be disabled. | |
s3_max_redirects |
10 | Max number of S3 redirects hops allowed. | |
s3_max_single_part_upload_size |
67108864 | The maximum size of object to upload using singlepart upload to S3. | |
special_sort |
not_specified | ||
system_events_show_zero_values |
0 | Include all metrics, even with zero values | |
union_default_mode |
Set default Union Mode in SelectWithUnion query. Possible values: empty string, ‘ALL’, ‘DISTINCT’. If empty, query without Union Mode will throw exception. | ||
use_antlr_parser |
0 | Parse incoming queries using ANTLR-generated experimental parser | |
use_compact_format_in_distributed_parts_names |
0 | 1 | Changes format of directories names for distributed table insert parts. |
Also, please refer to the release notes from the development team available at the following URLs:
- 20.9.2 release notes: https://clickhouse.tech/docs/en/whats-new/changelog/2020/#clickhouse-release-v20-9-2-20-2020-09-22
- 20.10.3 release notes: https://clickhouse.tech/docs/en/whats-new/changelog/2020/#clickhouse-release-v20-10-3-30-2020-10-28
- 20.11.2 release notes: https://clickhouse.tech/docs/en/whats-new/changelog/2020/#clickhouse-release-v20-11-2-1-2020-11-11
- 20.12.3 release notes: https://clickhouse.tech/docs/en/whats-new/changelog/2020/#clickhouse-release-v20-12-3-3-stable-2020-12-13
- 21.1.2 release notes: https://clickhouse.tech/docs/en/whats-new/changelog/#clickhouse-release-v21-1-2-15-stable-2021-01-18