Altinity Stable® Builds and Releases for ClickHouse® V21 and older
Here are the release notes from V21 and older. You can choose any of the following links to go directly to a particular release:
- V21.8 Build
- V21.3 Build
- V21.1 Build
- V20.8 Release
- V20.3 Release
- V19.16 Release
- V19.13 Release
- V19.11 Release
- V18.14 Release
V21.8 Build
Point releases:
- Detailed release notes and upgrade instructions for version 21.8.8 - Read these first!
- 21.8.15 - This is the latest release of version 21.8. We recommend that you upgrade to this release. (But read the detailed release notes and upgrade instructions first.)
- 21.8.13
- 21.8.12
- 21.8.11
- 21.8.10
Major new features since the previous stable release 21.3
Released 2021-10-11
Build report - 21.8.15.15 | Release page
A new release introduces a lot of changes and new functions. The full list is available in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:
- SQL features:
DISTINCT ON
a subset of columns- Partial support of SQL/JSON standard
- Arrays in dictionaries are now supported
- Arrays and nested data types are now supported for Parquet and Arrow formats
- DateTime64 extended range, Now dates between 1925 to 2283 years are supported. a)
- Security features:
- Disk level encryption
- Kerberos authentication for HTTP protocol a)
- Active Directory groups mapping for LDAP user directory a)
- Replication and Cluster improvements:
- ClickHouse Keeper (experimental) – in-process ZooKeeper replacement
- SYSTEM RESTORE REPLICA – a handy tool that makes life easier in the unfortunate occasion if ZooKeeper metadata is lost a)
- Support for MySQL/PostgreSQL clusters when using mysql/postgresql table functions
- Zero-copy replication for S3 tables (see Altinity blog post for details)
- Parts movement between shards (experimental)
- MergeTree features:
- Projections (experimental)
_partition_id
and_partition_value
virtual columns were added- Deduplication for non-replicated tables. See non_replicated_deduplication_window merge tree setting
- Integrations:
MaterializedPostgreSQL
database engine for replication from PostgreSQL- HDFS disk support (experimental)
- Allow to catch Kafka errors into a separate stream (see the KB article on this)
- Other:
- YAML configuration format as an alternative to XML
As usual with ClickHouse, there are many performance and operational improvements in different server components.
a) Contributed by Altinity developers.
Backward Incompatible Changes
The following changes are backward incompatible and require user attention during an upgrade:
- Users of
LowCardinality(Nullable(...))
can not safely downgrade to versions older than 21.4. Data in tables with columns of those types inserted / merged by 21.8 will be lost after the downgrade. - Values of
UUID
type cannot be compared with integers. For example, instead of writinguuid != 0
typeuuid != '00000000-0000-0000-0000-000000000000'
- The
toStartOfIntervalFunction
will align hour intervals to midnight (in previous versions they were aligned to the start of unix epoch). For example,toStartOfInterval(x, INTERVAL 11 HOUR)
will split every day into three intervals:00:00:00..10:59:59
,11:00:00..21:59:59
and22:00:00..23:59:59
. - It’s not possible to rollback to the older ClickHouse version after executing
ALTER ... ATTACH
query as the old servers would fail to process the new command entryATTACH_PART
in the replicated log. - The behaviour of
remote_url_allow_hosts
has changed. In previous versions the empty section did nothing, in 21.8 it will block access to all external hosts. Remove this section from the configuration files after an upgrade if you experience issues with url() or s3() functions. - If you will downgrade to version before 21.1 clickhouse will not be able to start automatically – you will need to remove the system.*_log tables manually to downgrade
- There is an issue with uniqueState(UUID) in AggregatingMergeTree tables, and can be corrected by replacing
uniqState(uuid)
in MATERIALIZED VIEWs
with uniqState(sipHash64(uuid))
and change data type for already saved data fromAggregateFunction(uniq, UUID)
toAggregateFunction(uniq, UInt64)
. For more information see the following:
Upgrade Notes
There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.
- Distributed queries with explicitly defined large sets are now executed differently. Compatibility setting
legacy_column_name_of_tuple_literal
may be enabled during the rolling upgrade of the cluster. Otherwise distributed queries with explicitly defined sets at IN clause may fail during upgrade. ATTACH PART[ITION]
queries may not work during cluster upgrade
Other Important Changes
ClickHouse embedded monitoring has become a bit more aggressive. It now collects several system stats, and stores them in the table system.asynchronious_metric_log
. This can be visible as an increase of background writes, storage usage, etc. To return to the old rate of metrics refresh / flush, adjust those settings in config.xml
:
<asynchronous_metrics_update_period_s>
60
</asynchronous_metrics_update_period_s>
<asynchronous_metric_log>
<flush_interval_milliseconds>
60000
</flush_interval_milliseconds>
</asynchronous_metric_log>
Alternatively, metric_log and asynchronous_metric_log tables can be completely disabled:
<yandex>
<asynchronous_metric_log remove="1"/>
<metric_log remove="1"/>
</yandex>
Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:
async_socket_for_remote
compile_aggregate_expressions
compile_expressions
cross_to_inner_join_rewrite
insert_null_as_default
optimize_skip_unused_shards_rewrite_in
query_plan_enable_optimizations
query_plan_filter_push_down
In the previous releases we recommended disabling optimize_on_insert
. This recommendation stays for 21.8 as well as inserts into Summing
and AggregatingMergeTree
can slow down.
Known issues in 21.8.8
The development team continues to improve the quality of the 21.8 release. The following issues still exist in the 21.8.8 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications:
system.events for event = 'Merge'
are overstated. ClickHouse incorrectly increments this counter.Timeout exceeded: elapsed 18446744073.709553 seconds
error that might happen in extremely rare cases, presumably due to some bug in kernel.
You may also look into a GitHub issues using a special v21.8-affected label.
ClickHouse Altinity Stable Releases are based on the community versions. For more information on installing ClickHouse from either the Altinity Stable builds or the community builds, see the ClickHouse Altinity Stable Builds Install Guide.
Please contact us at info@altinity.com if you experience any issues with the upgrade.
Appendix
New functions
- DateTime functions:
dateName
timeZone, timeZoneOf, timeZoneOffset, timezoneOf, toTimezone
- JSON processing functions:
JSON_EXISTS, JSON_QUERY, JSON_VALUE
simpleJSONExtractBool, simpleJSONExtractFloat, simpleJSONExtractInt, simpleJSONExtractRaw, simpleJSONExtractString, simpleJSONExtractUInt, simpleJSONHas
toJSONString
- String functions:
bin/unbin
leftPad, lpad, leftPadUTF8
rightPad, rpad, rightPadUTF8
splitByRegexp
- Array functions:
arrayProduct
bitPositionsToArray
validateNestedArraySizes
- Dictionary:
dictGetChildren
dictGetDescendants
dictGetOrNull
- Aggregate functions
deltaSumTimestamp
intervalLengthSum
lagInFrame/leadInFram
esequenceNextNode
sumCount
uniqTheta
- Geo:
polygonAreaCartesian, polygonAreaSpherical
polygonConvexHullCartesian
polygonPerimeterCartesian, polygonPerimeterSpherical
polygonsDistanceCartesian, polygonsDistanceSpherical
polygonsEqualsCartesian
polygonsIntersectionCartesian, polygonsIntersectionSpherical
polygonsSymDifferenceCartesian, polygonsSymDifferenceSpherical
polygonsUnionCartesian, polygonsUnionSpherical
polygonsWithinCartesian, polygonsWithinSpherical
readWktMultiPolygon, readWktPoint, readWktPolygon, readWktRing
wkt
- Math/Statistics:
quantileBFloat16, quantilesBFloat16
- Type related:
reinterpretAsUInt128
toUInt128, toUInt128OrNull, toUInt128OrZero
- Other:
indexHint
isIPAddressInRange
partitionId
– that corresponds to_partition_id virtual
column added to MergeTree tables
New table functions
dictionary
s3Cluster
– see our blog article that highlights this one
New table engines
ExternalDistributed
– allows to query MySQL or PostgreSQL clustersMaterializedPostgreSQL
New metrics and events
system.asynchronous_metrics
AsynchronousMetricsCalculationTimeSpent
The table has also got a lot of new metrics for a host monitoring rather than ClickHouse.
system.metrics
BrokenDistributedFilesToInsert
MMappedFileBytes
NetworkReceive
NetworkSend
system.events
MMappedFileCacheHits
MMappedFileCacheMisses
MergeTreeDataProjectionWriterBlocks
MergeTreeDataProjectionWriterBlocksAlreadySorted
MergeTreeDataProjectionWriterCompressedBytes
MergeTreeDataProjectionWriterRows
MergeTreeDataProjectionWriterUncompressedBytes
NetworkReceiveBytes
NetworkSendBytes
StorageBufferLayerLockReadersWaitMilliseconds
StorageBufferLayerLockWritersWaitMilliseconds
StorageBufferPassedBytesFlushThreshold
StorageBufferPassedRowsFlushThreshold
StorageBufferPassedTimeFlushThreshold
system.errors
CANNOT_CREATE_FILE
CANNOT_PARSE_YAML
CANNOT_SET_ROUNDING_MODE
CANNOT_SYSCONF
CONCURRENT_ACCESS_NOT_SUPPORTED
DISTRIBUTED_BROKEN_BATCH_FILES
DISTRIBUTED_BROKEN_BATCH_INFO
ILLEGAL_PROJECTION
INCORRECT_PART_TYPE
INVALID_FORMAT_INSERT_QUERY_WITH_DATA
KERBEROS_ERROR
NO_SUCH_PROJECTION_IN_TABLE
POSTGRESQL_CONNECTION_FAILURE
PROJECTION_NOT_USED
TOO_LARGE_DISTRIBUTED_DEPTH
UNKNOWN_SNAPSHOT
New system tables
data_skipping_indices
:- database, table, name, type, expr, granularity
part_moves_between_shards
:- database, table, task_name, task_uuid, create_time, part_name, part_uuid, to_shard, update_time, state, num_tries, last_exception
projection_parts
– same as parts but with extra ‘parent_*’ columnsprojection_parts_columns
– same as parts_columns but with extra ‘parent_*’ columns
New columns in system tables
- clusters
shutdown_count
- dictionaries:
key
– removedkey.names, key.types, found_rate
- distribution_queue:
broken_data_files, broken_data_compressed_bytes
- errors:
last_error_time, last_error_message, last_error_trace, remote
- metric_log:
- multiple columns
- parts_columns:
uuid, min_time, max_time
- processes:
- Settings (Map replaces Settings.Names/Values arrays), ProfileEvents (Map replaces ProfileEvents.Names/Values arrays), current_database
- query_log:
projections, initial_query_start_time, initial_query_start_time_microseconds
, Settings (Map replaces Settings.Names/Values arrays), ProfileEvents (Map replaces ProfileEvents.Names/Values arrays)
- query_thread_log:
initial_query_start_time_microseconds
, Settings (Map replaces Settings.Names/Values arrays), ProfileEvents (Map replaces ProfileEvents.Names/Values arrays)
- stack_trace:
thread_name
- table_engines:
supports_projections
- tables:
- comment
- users
grantees_any, grantees_list, grantees_except
System.merge_tree_settings added/changed
Name | Old value | New value | Description |
---|---|---|---|
max_parts_to_merge_at_once | 100 | Max amount of parts which can be merged at once (0 - disabled). Doesn’t affect OPTIMIZE FINAL query. | |
max_replicated_fetches_network_bandwidth | 0 | The maximum speed of data exchange over the network in bytes per second for replicated fetches. Zero means unlimited. | |
max_replicated_sends_network_bandwidth | 0 | The maximum speed of data exchange over the network in bytes per second for replicated sends. Zero means unlimited. | |
min_bytes_to_rebalance_partition_over_jbod | 0 | Minimal amount of bytes to enable part rebalance over JBOD array (0 - disabled). | |
non_replicated_deduplication_window | 0 | How many last blocks of hashes should be kept on disk (0 - disabled). | |
part_moves_between_shards_delay_seconds | 30 | Time to wait before/after moving parts between shards. | |
part_moves_between_shards_enable | 0 | Experimental/Incomplete feature to move parts between shards. Does not take into account sharding expressions. | |
replicated_fetches_http_connection_timeout | 0 | HTTP connection timeout for part fetch requests. Inherited from default profile http_connection_timeout if not set explicitly. | |
replicated_fetches_http_receive_timeout | 0 | HTTP receive timeout for fetch part requests. Inherited from default profile http_receive_timeout if not set explicitly. | |
replicated_fetches_http_send_timeout | 0 | HTTP send timeout for part fetch requests. Inherited from default profile http_send_timeout if not set explicitly. | |
remote_fs_execute_merges_on_single_replica_time_threshold | 10800 | When greater than zero only a single replica starts the merge immediately when merged part on shared storage and ‘allow_remote_fs_zero_copy_replication’ is enabled. |
system.settings added/changed
Type | Name | Old value | New value | Description |
---|---|---|---|---|
settings | allow_experimental_bigint_types | 0 | 1 | Obsolete setting, does nothing. |
settings | allow_experimental_codecs | 0 | If it is set to true, allow to specify experimental compression codecs (but we don't have those yet and this option does nothing). | |
settings | allow_experimental_database_materialized_postgresql | 0 | Allow to create database with Engine=MaterializedPostgreSQL(…). | |
settings | allow_experimental_funnel_functions | 0 | Enable experimental functions for funnel analysis. | |
settings | allow_experimental_map_type | 0 | 1 | Obsolete setting, does nothing. |
settings | allow_experimental_projection_optimization | 0 | Enable projection optimization when processing SELECT queries | |
settings | async_socket_for_remote | 0 | 1 | Asynchronously read from socket executing remote query |
settings | background_schedule_pool_size | 16 | 128 | Number of threads performing background tasks for replicated tables, dns cache updates. Only has meaning at server startup. |
settings | compile_aggregate_expressions | 1 | Compile aggregate functions to native code. | |
settings | compile_expressions | 0 | 1 | Compile some scalar functions and operators to native code. |
settings | cross_to_inner_join_rewrite | 1 | Use inner join instead of comma/cross join if possible | |
settings | database_replicated_always_detach_permanently | 0 | Execute DETACH TABLE as DETACH TABLE PERMANENTLY if database engine is Replicated | |
settings | distributed_ddl_entry_format_version | 1 | Version of DDL entry to write into ZooKeeper | |
settings | distributed_ddl_output_mode | throw | Format of distributed DDL query result | |
settings | distributed_directory_monitor_split_batch_on_failure | 0 | Should StorageDistributed DirectoryMonitors try to split batch into smaller in case of failures. | |
settings | distributed_push_down_limit | 0 | If 1, LIMIT will be applied on each shard separately. Usually you don't need to use it, since this will be done automatically if it is possible, i.e. for simple query SELECT FROM LIMIT. | |
settings | experimental_query_deduplication_send_all_part_uuids | 0 | If false only part UUIDs for currently moving parts are sent. If true all read part UUIDs are sent (useful only for testing). | |
settings | external_storage_max_read_bytes | 0 | Limit maximum number of bytes when table with external engine should flush history data. Now supported only for MySQL table engine, database engine, dictionary and MaterializeMySQL. If equal to 0, this setting is disabled | |
settings | external_storage_max_read_rows | 0 | Limit maximum number of rows when table with external engine should flush history data. Now supported only for MySQL table engine, database engine, dictionary and MaterializeMySQL. If equal to 0, this setting is disabled | |
settings | force_optimize_projection | 0 | If projection optimization is enabled, SELECT queries need to use projection | |
settings | glob_expansion_max_elements | 1000 | Maximum number of allowed addresses (For external storages, table functions, etc). | |
settings | group_by_two_level_threshold_bytes | 100000000 | 50000000 | From what size of the aggregation state in bytes, a two-level aggregation begins to be used. 0 - the threshold is not set. Two-level aggregation is used when at least one of the thresholds is triggered. |
settings | handle_kafka_error_mode | default | Obsolete setting, does nothing. | |
settings | http_max_field_name_size | 1048576 | Maximum length of field name in HTTP header | |
settings | http_max_field_value_size | 1048576 | Maximum length of field value in HTTP header | |
settings | http_max_fields | 1000000 | Maximum number of fields in HTTP header | |
settings | http_max_uri_size | 1048576 | Maximum URI length of HTTP request | |
settings | insert_null_as_default | 1 | Insert DEFAULT values instead of NULL in INSERT SELECT (UNION ALL) | |
settings | legacy_column_name_of_tuple_literal | 0 | List all names of element of large tuple literals in their column names instead of hash. This settings exists only for compatibility reasons. It makes sense to set to 'true', while doing rolling update of cluster from version lower than 21.7 to higher. | |
settings | max_distributed_depth | 5 | Maximum distributed query depth | |
settings | max_replicated_fetches_network_bandwidth_for_server | 0 | The maximum speed of data exchange over the network in bytes per second for replicated fetches. Zero means unlimited. Only has meaning at server startup. | |
settings | max_replicated_sends_network_bandwidth_for_server | 0 | The maximum speed of data exchange over the network in bytes per second for replicated sends. Zero means unlimited. Only has meaning at server startup. | |
settings | min_count_to_compile_aggregate_expression | 3 | The number of identical aggregate expressions before they are JIT-compiled | |
settings | normalize_function_names | 1 | 0 | Normalize function names to their canonical names |
settings | odbc_bridge_connection_pool_size | 16 | Connection pool size for each connection settings string in ODBC bridge. | |
settings | optimize_functions_to_subcolumns | 0 | Transform functions to subcolumns, if possible, to reduce amount of read data. E.g. 'length(arr)' -> 'arr.size0', 'col IS NULL' -> 'col.null' | |
settings | optimize_fuse_sum_count_avg | 0 | Fuse aggregate functions sum(), avg(), count() with identical arguments into one sumCount() call, if the query has at least two different functions | |
settings | optimize_move_to_prewhere_if_final | 0 | If query has FINAL , the optimization move_to_prewhere is not always correct and it is enabled only if both settings optimize_move_to_prewhere and optimize_move_to_prewhere_if_final are turned on |
|
settings | optimize_skip_unused_shards_limit | 1000 | Limit for number of sharding key values, turns off optimize_skip_unused_shards if the limit is reached | |
settings | optimize_skip_unused_shards_rewrite_in | 1 | Rewrite IN in query for remote shards to exclude values that does not belong to the shard (requires optimize_skip_unused_shards) | |
settings | output_format_arrow_low_cardinality_as_dictionary | 0 | Enable output LowCardinality type as Dictionary Arrow type | |
settings | postgresql_connection_pool_size | 16 | Connection pool size for PostgreSQL table engine and database engine. | |
settings | postgresql_connection_pool_wait_timeout | 5000 | Connection pool push/pop timeout on empty pool for PostgreSQL table engine and database engine. By default it will block on empty pool. | |
settings | prefer_column_name_to_alias | 0 | Prefer using column names instead of aliases if possible. | |
settings | prefer_global_in_and_join | 0 | If enabled, all IN/JOIN operators will be rewritten as GLOBAL IN/JOIN. It's useful when the to-be-joined tables are only available on the initiator and we need to always scatter their data on-the-fly during distributed processing with the GLOBAL keyword. It's also useful to reduce the need to access the external sources joining external tables. | |
settings | query_plan_enable_optimizations | 1 | Apply optimizations to query plan | |
settings | query_plan_filter_push_down | 1 | Allow to push down filter by predicate query plan step | |
settings | s3_max_single_read_retries | 4 | The maximum number of retries during single S3 read. | |
settings | sleep_in_send_data | 0 | ||
settings | sleep_in_send_data_ms | 0 | Time to sleep in sending data in TCPHandler | |
settings | sleep_in_send_tables_status | 0 | ||
settings | sleep_in_send_tables_status_ms | 0 | Time to sleep in sending tables status response in TCPHandler | |
settings | use_antlr_parser | 0 |
Also, please refer to the release notes from the development team available at the following URLs:
- 21.4 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2021#clickhouse-release-214
- 21.5 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2021#clickhouse-release-215-2021-05-20
- 21.6 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2021#clickhouse-release-216-2021-06-05
- 21.7 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2021#clickhouse-release-v217-2021-07-09
- 21.8 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2021#clickhouse-release-v218-2021-08-12
21.8.15
Released 2022-04-15
Release Notes
Based on upstream/v21.8.15.7-lts.
ClickHouse release v21.8.15.7-altinitystable as compared to community ClickHouse v21.8.15.7:
Changes compared to the Community Build
Bug Fixes
- Bug Fix: Fixed issue with merging configs that have and root XML tags. (backport of ClickHouse@213ecae) (via #114).
- Bug Fix: Backport of ClickHouse#31823 to 21.8: Fix invalid cast of nullable type when nullable primary key is used (via #110)
Features
- Build/Testing/Packaging Improvement: Various macOS compilation and packaging fixes/backporting (via #108)
- Build/Testing/Packaging Improvement: Minor test improvements (via afb4a0a, 4c35386)
Changes compared to Altinity Stable 21.8.13.1
Bug Fixes
- Fix segfault while parsing ORC file with corrupted footer. Closes ClickHouse#33797 via ClickHouse#33984 Kruglov Pavel.
- Fix usage of functions array and tuple with literal arguments in distributed queries. Previously it could lead to Not found columns exception. ClickHouse#33938 (Anton Popov).
- Fix s3 table function reading empty file. Closes ClickHouse#33008 via ClickHouse#33037 (Kseniia Sumarokova).
- Fix hdfs url check that didn’t allow using HA namenode address. Bug was introduced in ClickHouse#31042. ClickHouse#32976 (Kruglov Pavel).
- Server might fail to start if database with MySQL engine cannot connect to MySQL server, it’s fixed. Fixes ClickHouse#14441 via ClickHouse#32802 (tavplubix).
Test Results
Packages
- Deb/RPMs: https://builds.altinity.cloud/
- Docker image:
altinity/clickhouse-server:21.8.15.15.altinitystable
- Homebrew formula:
clickhouse@21.8-altinity-stable-21.8.15.15
21.8.13
Released 2022-01-05
Release Notes
ClickHouse release v21.8.13.1-altinitystable as compared to v21.8.12.29-altinitystable:
Bug Fixes
-
Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee).
-
Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31656 (sunny).
-
NO CL ENTRY: ‘fix json error after downgrade’. #33166 (bullet1337).
-
Integer overflow to resize the arrays causes heap corrupt. #33024 (varadarajkumar).
-
fix crash when used fuzzBits with multiply same FixedString, Close #32737. #32755 (SuperDJY).
-
Fix possible exception at RabbitMQ storage startup by delaying channel creation. #32584 (Kseniia Sumarokova).
-
Fixed crash with SIGFPE in aggregate function
avgWeighted
withDecimal
argument. Fixes #32053. #32303 (tavplubix). -
Some replication queue entries might hang for
temporary_directories_lifetime
(1 day by default) withDirectory tmp_merge_<part_name>
orPart ... (state Deleting) already exists, but it will be deleted soon
or similar error. It’s fixed. Fixes #29616. #32201 (tavplubix). -
XML dictionaries identifiers, used in table create query, can be qualified to
default_database
during upgrade to newer version. Closes #31963. #32187 (Maksim Kita). -
Number of active replicas might be determined incorrectly when inserting with quorum if setting
replicated_can_become_leader
is disabled on some replicas. It’s fixed. #32157 (tavplubix). -
Fixed
Directory ... already exists and is not empty
error when detaching part. #32063 (tavplubix). -
Some
GET_PART
entry might hang in replication queue if part is lost on all replicas and there are no other parts in the same partition. It’s fixed in cases when partition key contains only columns of integer types orDate[Time]
. Fixes #31485. #31887 (tavplubix). -
Change configuration path from
keeper_server.session_timeout_ms
tokeeper_server.coordination_settings.session_timeout_ms
when constructing aKeeperTCPHandler
- Same withoperation_timeout
. #31859 (JackyWoo). -
Fix a bug about function transform with decimal args. #31839 (李帅).
-
Fix crash when function
dictGet
with type is used for dictionary attribute when type isNullable
. Fixes #30980. #31800 (Maksim Kita). -
Fix crash with empty result on odbc query. Closes #31465. #31766 (Kseniia Sumarokova).
-
Fix possible crash (or incorrect result) in case of
LowCardinality
arguments of window function. Fixes #31114. #31888 (Nikolai Kochetov).
Changes compared to the Community Build
- Fix invalid cast of Nullable type when nullable primary key is used. (Nullable primary key is a discouraged feature - please do not use). This fixes #31075 #31823 Amos Bird).
Test Reports
Test Report for Altinity Stable Build v21.8.13.1.
21.8.12
Released 2021-10-21
Release Notes
ClickHouse release v21.8.12.29-altinitystable as compared to v21.8.11.1-altinitystable:
Performance Improvements
- Improved performance of JSON and XML output formats. #31673 (alexey-milovidov).
Bug Fixes
Build/Testing/Packaging Improvements
Bug Fixes (user-visible misbehaviour in official stable or prestable release)
- Fixed functions
empty
andnotEmpty
with the arguments ofUUID
type. Fixes #31819. #31883 (Anton Popov). - Fixed possible assertion
../src/IO/ReadBuffer.h:58: bool DB::ReadBuffer::next(): Assertion '!hasPendingData()' failed.
in TSKV format. #31804 (Kruglov Pavel). - Fixed usage of
Buffer
table engine with typeMap
. Fixes #30546. #31742 (Anton Popov). - Fixed race in JSONEachRowWithProgress output format when data and lines with progress are mixed in output. #31736 (Kruglov Pavel).
- Fixed
there are no such cluster here
error on execution ofON CLUSTER
query if specified cluster name is name ofReplicated
database. #31723 (tavplubix). - Settings
input_format_allow_errors_num
andinput_format_allow_errors_ratio
did not work for parsing of domain types, such asIPv4
, it’s fixed. Fixes #31686. #31697 (tavplubix). RENAME TABLE
query worked incorrectly on attempt to rename an DDL dictionary inOrdinary
database, it’s fixed. #31638 (tavplubix).- Fix invalid generated JSON when only column names contain invalid UTF-8 sequences. #31534 (Kevin Michel).
- Resolve
nullptr
in STS credentials provider for S3. #31409 (Vladimir Chebotarev). - Remove not like function into RPNElement. #31169 (sundyli).
- Fixed bug in Keeper which can lead to inability to start when some coordination logs was lost and we have more fresh snapshot than our latest log. #31150 (alesapin).
- Fixed abort in debug server and
DB::Exception: std::out_of_range: basic_string
error in release server in case of bad hdfs url by adding additional check of hdfs url structure. #31042 (Kruglov Pavel).
21.8.11
Released 2021-11-19
ClickHouse release v21.8.11.1-altinitystable FIXME as compared to v21.8.10.1-altinitystable
New Features
CompiledExpressionCache
limit elements size usingcompiled_expression_cache_elements_size
setting. #30667 (Maksim Kita).
Improvements
- Made query which fetched table structure for PostgreSQL database more reliable. #30477 (Kseniia Sumarokova).
Bug Fixes
- Fixed
SHOW GRANTS
when partial revokes are used. This PR fixes #31138. #31249 (Vitaly Baranov).
Bug Fixes (user-visible misbehaviour in official stable or prestable release)
- Fixed
StorageMerge
with aliases and where (it did not work before at all). Closes #28802. #31044 (Kseniia Sumarokova). - Fixed
JSONValue
/Query
with quoted identifiers. This allows to have spaces in json path. Closes #30971. #31003 (Kseniia Sumarokova). - Using
formatRow
function with not row formats led to segfault. Don’t allow to use this function with such formats (because it doesn’t make sense). #31001 (Kruglov Pavel). - Skip
max_partition_size_to_drop
check in case ofATTACH PARTITION ... FROM
andMOVE PARTITION ...
#30995 (Amr Alaa). - Fixed set index not used in
AND/OR
expressions when there are more than two operands. This fixes #30416 . #30887 (Amos Bird). - Fixed ambiguity when extracting auxiliary ZooKeeper name from ZooKeeper path in
ReplicatedMergeTree
. Previously server might fail to start withUnknown auxiliary ZooKeeper name
if ZooKeeper path contains a colon. Fixes #29052. Also it was allowed to specify ZooKeeper path that does not start with slash, but now it’s deprecated and creation of new tables with such path is not allowed. Slashes and colons in auxiliary ZooKeeper names are not allowed too. #30822 (tavplubix). - Fixed a race condition between
REPLACE/MOVE PARTITION
and background merge in non-replicatedMergeTree
that might cause a part of moved/replaced data to remain in partition. Fixes #29327. #30717 (tavplubix). - Fixed
PREWHERE
withWHERE
in case of always truePREWHERE
. #30668 (Azat Khuzhin). - Functions for case-insensitive search in UTF8 strings like
positionCaseInsensitiveUTF8
andcountSubstringsCaseInsensitiveUTF8
might find substrings that actually does not match is fixed. #30663 (tavplubix). - Limit push down optimization could cause a error
Cannot find column
. Fixes #30438. #30562 (Nikolai Kochetov). - Fixed exception handling in
parallel_view_processing
. This resolves issues / prevents crashes in some rare corner cases when that feature is enabled and exception (likeMemory limit exceeded ...
) happened in the middle of materialized view processing. #30472 (filimonov). - Fixed segfault which might happen if session expired during execution of
REPLACE PARTITION
. #30432 (tavplubix). - Fixed
ComplexKeyHashedDictionary
,ComplexKeySparseHashedDictionary
parsingpreallocate
option from layout config. #30246 (Maksim Kita). - Fixed
[I]LIKE
function. Closes #28661. #30244 (Nikolay Degterinsky). - Support nullable arguments in function
initializeAggregation
. #30177 (Anton Popov). - Fixed data-race between
LogSink::writeMarks()
andLogSource
inStorageLog
. #29946 (Azat Khuzhin). - Fixed hanging DDL queries on Replicated database while adding a new replica. #29328 (Kevin Michel).
- Fixed bad optimizations of
ORDER BY
if it containsWITH FILL
. This closes #28908. This closes #26049. #28910 (alexey-milovidov). - Fixed queries to external databases (i.e. MySQL) with multiple columns in
IN
( i.e.(k,v) IN ((1, 2))
) (but note that this has some backward incompatibility for theclickhouse-copier
since it uses alias for tuple element). #28888 (Azat Khuzhin). - Fixed “Column is not under aggregate function and not in GROUP BY” with
PREWHERE
(Fixes: #28461). #28502 (Azat Khuzhin). - Fixed
NOT-IN
index optimization when not all key columns are used. This fixes #28120. #28315 (Amos Bird).
Bug Fixes (user-visible misbehaviour in official stable or prestable release
- Fixed
ORDER BY ... WITH FILL
with setTO
andFROM
and no rows in result set. #30888 (Anton Popov).
21.8.10
Released 2021-10-23
Release notes for Altinity Stable Build 21.8.10
ClickHouse release 21.8.10 as compared to v21.8.8
Improvements
- Update zoneinfo files to 2021c. #29925 (alexey-milovidov).
- Allow symlinks to files in user_files directory for file table function. #30309 (Kseniia Sumarokova).
Bug Fixes
- Fix shutdown of
AccessControlManager
. Now there can’t be reloading of the configuration after AccessControlManager has been destroyed. This PR fixes the flaky test test_user_directories/test.py::test_relative_path. #29951 (Vitaly Baranov). - Allow using a materialized column as the sharding key in a distributed table even if
insert_allow_materialized_columns=0
:. #28637 (Vitaly Baranov). - FlatDictionary, HashedDictionary fix bytes_allocated calculation for nullable attributes. #30238 (Maksim Kita).
- Dropped
Memory
database might reappear after server restart, it’s fixed (#29795). Also addedforce_remove_data_recursively_on_drop
setting as a workaround forDirectory not empty
error when droppingOrdinary
database (because it’s not possible to remove data leftovers manually in cloud environment). #30054 (tavplubix). - Fix crash of sample by
tuple()
, closes #30004. #30016 (Flynn). - Fix possible data-race between
FileChecker
andStorageLog
/StorageStripeLog
. #29959 (Azat Khuzhin). - Fix system tables recreation check (fails to detect changes in enum values). #29857 (Azat Khuzhin).
- Avoid
Timeout exceeded: elapsed 18446744073.709553 seconds
error that might happen in extremely rare cases, presumably due to some bug in kernel. Fixes #29154. #29811 (tavplubix). - Fix bad cast in
ATTACH TABLE ... FROM 'path'
query when non-string literal is used instead of path. It may lead to reading of uninitialized memory. #29790 (alexey-milovidov). - Fix concurrent access to
LowCardinality
duringGROUP BY
(leads to SIGSEGV). #29782 (Azat Khuzhin). - Fixed incorrect behaviour of setting
materialized_postgresql_tables_list
at server restart. Found in #28529. #29686 (Kseniia Sumarokova). - Condition in filter predicate could be lost after push-down optimisation. #29625 (Nikolai Kochetov).
- Fix rare segfault in
ALTER MODIFY
query when using incorrect table identifier inDEFAULT
expression likex.y.z...
Fixes #29184. #29573 (alesapin). - Fix bug in check
pathStartsWith
because there was bug with the usage ofstd::mismatch
:The behavior is undefined if the second range is shorter than the first range.
. #29531 (Kseniia Sumarokova). - In ODBC bridge add retries for error Invalid cursor state. It is a retriable error. Closes #29473. #29518 (Kseniia Sumarokova).
- Fix possible
Block structure mismatch
for subqueries with pushed-downHAVING
predicate. Fixes #29010. #29475 (Nikolai Kochetov). - Avoid deadlocks when reading and writing on JOIN Engine tables at the same time. #30187 (Raúl Marín).
- Fix INSERT SELECT incorrectly fills MATERIALIZED column based of Nullable column. #30189 (Azat Khuzhin).
- Fix null deference for
GROUP BY WITH TOTALS HAVING
(when the column fromHAVING
wasn’t selected). #29553 (Azat Khuzhin).
V21.3 Build
Point releases:
- Detailed release notes and upgrade instructions for version 21.3.13.9 - Read these first!
- 21.3.20.2 - This is the latest release of version 21.3. We recommend that you upgrade to this release. (But read the detailed release notes and upgrade instructions first.)
- 21.3.17.3
Major new features since the previous stable release 21.1.x
Released 2021-06-29
A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:
- SQL features:
- Window functions! See our blog article for an introduction.
- Support multiple levels nesting in the Nested
datatype
.
- Security features:
- LDAP roles mapping a).
- Apply row-level security as a separate
PREWHERE
step a). - Server side keys support for S3 table function a)>.
- Cluster improvements:
- Hedged requests! That allows to reduce tail latencies on large clusters by running the same query at different replicas. This is controlled by
use_hedged_requests
andmax_parallel_replicas
settings. - Allow inserts into a specific shard via a distributed table. This is controlled by the
insert_shard_id
setting. - Allow inserts into
cluster()
table function and specify a sharding key. - Replicated database engine (experimental). Allows to replicate DDL statements across the cluster. It is not production ready in this release.
- Hedged requests! That allows to reduce tail latencies on large clusters by running the same query at different replicas. This is controlled by
- MergeTree features:
- Table level concurrency control. Can be controlled by the
max_concurrent_queries
merge tree setting.
- Table level concurrency control. Can be controlled by the
- Integrations:
- PostgreSQL table engine, table function and dictionary source.
As usual with ClickHouse, there are many performance and operational improvements in different server components.
a) Contributed by Altinity developers.
Backward Incompatible Changes
The following changes are backward incompatible and require user attention during an upgrade:
- It is no longer allowed to create
MergeTree
tables in the old syntax with tableTTL
; this syntax is now ignored. Attaching old tables is still possible. - Floating point columns and values are disallowed in:
- Partitioning key – these can be turned back on with the
allow_floating_point_partition_key
merge tree setting. - Bitwise functions.
- Partitioning key – these can be turned back on with the
- Excessive parenthesis in type definitions are no longer supported, example:
Array((UInt8))
.
Upgrade Notes
There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.
- Now replicas that are processing the
ALTER TABLE ATTACH PART[ITION]
command search theirdetached/
folders before fetching the data from other replicas. As an implementation detail, a new commandATTACH_PART
is introduced in the replicated log. Parts are searched and compared by their checksums.ATTACH PART[ITION]
queries may not work during cluster upgrade.
Notice
It’s not possible to rollback to an older ClickHouse version after executingALTER ... ATTACH
query in the new version as the old servers would fail to pass the ATTACH_PART
entry in the replicated log.
- If you are upgrading RPM packages from https://packagecloud.io/Altinity/clickhouse-altinity-stable you will have to switch the repository since the packaging structure has been changed. Please refer to these instructions: https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-packaging-compatibility-greater-than-21.x-and-earlier.
- In the release notes for 21.1 we recommended disabling
optimize_on_insert
.This recommendation stays for 21.3 as well as inserts into Replacing, Summing, Collapsing and AggregatingMergeTree can produce empty parts.
Other Important Changes
Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:
distributed_aggregation_memory_efficient
enable_global_with_statement
optimize_normalize_count_variants
optimize_respect_aliases
optimize_rewrite_sum_if_to_count_if
Known issues in 21.3.13.9
The development team continues to improve the quality of the 21.3 release. The following issues still exist in the 21.3.13.9 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications:
- Replication may get stuck due to clear empty parts in rare cases. This bug existed in 20.8 and 21.1 versions as well: https://github.com/ClickHouse/ClickHouse/issues/23292 (Fixed in 21.7).
- Queries to a Merge engine table with a
JOIN
may be very slow due to not respected partition and primary keys. The bug existed in 20.8 and 21.1 versions as well: https://github.com/ClickHouse/ClickHouse/issues/22226 (fixed in 21.5). - Queries with multiple
WITH
andJOIN
clauses may run significantly slower that in previous releases: https://github.com/ClickHouse/ClickHouse/issues/21067. - Skipping indices are not used in rare cases leading to increased query time: https://github.com/ClickHouse/ClickHouse/issues/20346.
ClickHouse Altinity Stable releases are based on the community versions. Community builds can be downloaded from repo.clickhouse.tech.
ClickHouse Altinity builds may be found at:
- Deb packages: https://packagecloud.io/Altinity/altinity-stable
- RPM packages: https://packagecloud.io/Altinity/altinity-stable
- Docker image: altinity/clickhouse-server:21.3.13.9.altinity_stable
Please contact us at info@altinity.com if you experience any issues with the upgrade.
Appendix
New functions
- DateTime functions:
- Compatibility with MySQL:
DATE, DAY, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, HOUR, MINUTE, MONTH, QUARTER, SECOND, YEAR
parseDateTimeBestEffortUSOrNull
parseDateTimeBestEffortUSOrZero
timezoneOffset
- Compatibility with MySQL:
- String functions:
decodeXMLComponent, extractTextFromHTML
- Map functions:
mapContains, mapKeys, mapValues
- Aggregate functions
- New combinator
-SimpleState
deltaSum
runningConcurrency
- New combinator
- Functions to work with new Map data type:
map
- Math/Statistics:
sign
- Type related:
reinterpret, reinterpretAsInt128, reinterpretAsInt256, reinterpretAsUInt256
- Internal ClickHouse:
connectionID, connection_id
normalizeQueryKeepNames, normalizedQueryHashKeepNames
New table functions
postgresql
file
New table engines
PostgreSQL
New metrics and events
system.metrics.
- PartsCommitted
- PartsCompact
- PartsDeleteOnDestroy
- PartsDeleting
- PartsInMemory
- PartsOutdated
- PartsPreCommitted
- PartsTemporary
- PartsWide
system.events:
- DistributedDelayedInserts
- DistributedDelayedInsertsMilliseconds
- DistributedRejectedInserts
- HedgedRequestsChangeReplica
New system tables
- distributed_ddl_queue
New columns in system tables
- metric_log.
- ProfileEvent_DistributedDelayedInserts, ProfileEvent_DistributedRejectedInserts, ProfileEvent_DistributedDelayedInsertsMilliseconds, ProfileEvent_HedgedRequestsChangeReplica, CurrentMetric_PartsTemporary, CurrentMetric_PartsPreCommitted, CurrentMetric_PartsCommitted, CurrentMetric_PartsOutdated, CurrentMetric_PartsDeleting, CurrentMetric_PartsDeleteOnDestroy, CurrentMetric_PartsWide, CurrentMetric_PartsCompact, CurrentMetric_PartsInMemory
- Parts:
- group_by_ttl_info.expression, group_by_ttl_info.min, group_by_ttl_info.max, rows_where_ttl_info.expression, rows_where_ttl_info.min, rows_where_ttl_info.max
- processes.http_referer
- query_log:
- http_referer, log_comment, used_aggregate_functions, used_aggregate_function_combinators, used_database_engines, used_data_type_families, used_dictionaries, used_formats, used_functions, used_storages, used_table_functions
- query_thread_log.http_referer
- quota_limits:
- max_query_selects, max_query_inserts
- quota_usage:
- query_selects, max_query_selects, query_inserts, max_query_inserts
- quotas_usage:
- query_selects, max_query_selects, query_inserts, max_query_inserts
System.merge_tree_settings added/changed
Name | Old value | New value | Description |
---|---|---|---|
allow_floating_point_partition_key | 0 | Allow floating point as partition key. | |
inactive_parts_to_delay_insert | 0 | If table contains at least that many inactive parts in single partition, artificially slow down insert into table. | |
inactive_parts_to_throw_insert | 0 | If more than this number of inactive parts are in a single partition, throw the ‘Too many inactive parts … ’ exception. |
|
max_concurrent_queries | 0 | Max number of concurrently executed queries related to the MergeTree table (0 - disabled). Queries will still be limited by other max_concurrent_queries settings. |
|
min_marks_to_honor_max_concurrent_queries | 0 | Minimal number of marks to honor the MergeTree-level’s max_concurrent_queries (0 - disabled). Queries will still be limited by other max_concurrent_queries settings. |
system.settings added/changed
Name | Old value | New value | Description |
---|---|---|---|
allow_changing_replica_until_first_data_packet | 0 | Allow HedgedConnections to change the replica until receiving the first data packet. | |
allow_experimental_database_replicated | 0 | Allows creating databases with the Replicated engine. | |
allow_experimental_query_deduplication | 0 | Allows sending parts’ UUIDs for a query in order to deduplicate data parts, if any. | |
async_socket_for_remote | 1 | 0 | Asynchronously read from socket executing remote query. |
background_fetches_pool_size | 3 | 8 | Number of threads performing background fetches for replicated tables. Only has meaning at server startup. |
checksum_on_read | 1 | Validates checksums on reading. It is enabled by default and should be always enabled in production. Please do not expect any benefits in disabling this setting. It may only be used for experiments and benchmarks. These settings are only applicable for tables of the MergeTree family. Checksums are always validated for other table engines and when receiving data over the network. | |
database_replicated_ddl_output | 1 | Returns table with query execution status as a result of a DDL query. | |
database_replicated_initial_query_timeout_sec | 300 | How long an initial DDL query should wait for a Replicated database to process previous DDL queue entries. | |
distributed_aggregation_memory_efficient | 0 | 1 | The memory-saving mode of distributed aggregation is enabled. |
enable_global_with_statement | 0 | 1 | Propagate WITH statements to UNION queries and all subqueries. |
engine_file_empty_if_not_exists | 0 | Allows selecting data from a file engine table without a file. | |
engine_file_truncate_on_insert | 0 | Enables or disables truncate before insert in file engine tables. | |
flatten_nested | 1 | If true, columns of type Nested will be flatten to separate array columns instead of one array of tuples. |
|
hedged_connection_timeout_ms | 100 | Connection timeout for establishing connection with replica for Hedged requests. | |
insert_in_memory_parts_timeout | 600000 | REMOVED. | |
insert_shard_id | 0 | If non zero, when inserting into a distributed table, the data will be inserted into the shard insert_shard_id synchronously. Possible values range from 1 to shards_number of the corresponding distributed table. |
|
log_comment | Log comment into system.query_log table and server log. It can be set to an arbitrary string no longer than max_query_size . |
||
normalize_function_names | 0 | Normalize function names to their canonical names. | |
optimize_normalize_count_variants | 1 | Rewrite aggregate functions that semantically equals count() as count() . |
|
optimize_respect_aliases | 1 | If it is set to true, it will respect aliases in WHERE/GROUP BY/ORDER BY , that will help with partition pruning/secondary indexes/optimize_aggregation_in_order/optimize_read_in_order/optimize_trivial_count . |
|
optimize_rewrite_sum_if_to_count_if | 1 | Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent. |
|
periodic_live_view_refresh | 60 | Interval after which a periodically refreshed live view is forced to refresh. | |
query_plan_max_optimizations_to_apply | 10000 | Limit the total number of optimizations applied to the query plan. If zero, ignored. If the limit is reached, throw an exception. | |
receive_data_timeout_ms | 2000 | Connection timeout for receiving the first packet of data or packet with positive progress from replica. | |
s3_max_connections | 1024 | The maximum number of connections per server. | |
sleep_in_send_data | 0 | Time to sleep in sending data in TCPHandler . |
|
sleep_in_send_tables_status | 0 | Time to sleep in sending tables the status response in TCPHandler . |
|
unknown_packet_in_send_data | 0 | Sends unknown packet instead of N-th data packet. |
|
use_hedged_requests | 0 | Use hedged requests for distributed queries. |
Also, please refer to the release notes from the development team available at the following URLs:
- 21.2 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2021#clickhouse-release-212
- 21.3 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2021#clickhouse-release-213-lts
21.3.20.2
Released 2022-02-10
Build report - 21.3.20.2 | Release page
Changes Compared to Altinity Stable Build 21.3.17.3
Bug Fixes
- Integer overflow to resize the arrays causes heap corrupt. #33024 (Rajkumar Varada).
- fix crash when used fuzzBits with multiply same FixedString, Close #32737. #32755 (SuperDJY).
- Number of active replicas might be determined incorrectly when inserting with quorum if setting
replicated_can_become_leader
is disabled on some replicas. It’s fixed. #32157 (tavplubix). - Fix possible assertion
../src/IO/ReadBuffer.h:58: bool DB::ReadBuffer::next(): Assertion '!hasPendingData()' failed.
in TSKV format. #31804 (Kruglov Pavel). - Fix crash when function
dictGet
with type is used for dictionary attribute when type isNullable
. Fixes #30980. #31800 (Maksim Kita). - Fix usage of
Buffer
table engine with typeMap
. Fixes #30546. #31742 (Anton Popov). - Fix race in JSONEachRowWithProgress output format when data and lines with progress are mixed in output. #31736 (Kruglov Pavel).
- Settings
input_format_allow_errors_num
andinput_format_allow_errors_ratio
did not work for parsing of domain types, such asIPv4
, it’s fixed. Fixes #31686. #31697 (tavplubix). - Remove not like function into RPNElement. #31169 (sundyli).
- Using
formatRow
function with not row formats led to segfault. Don’t allow to use this function with such formats (because it doesn’t make sense). #31001 (Kruglov Pavel). - Functions for case-insensitive search in UTF8 strings like
positionCaseInsensitiveUTF8
andcountSubstringsCaseInsensitiveUTF8
might find substrings that actually does not match, it’s fixed. #30663 (tavplubix). - Fixed segfault which might happen if session expired during execution of REPLACE PARTITION. #30432 (tavplubix).
- Fix
[I]LIKE
function. Closes #28661. #30244 (Nikolay Degterinsky). - FlatDictionary, HashedDictionary fix bytes_allocated calculation for nullable attributes. #30238 (Maksim Kita).
- Support nullable arguments in function
initializeAggregation
. #30177 (Anton Popov). - Fix crash of sample by
tuple()
, closes #30004. #30016 (flynn). - Fix concurrent access to
LowCardinality
duringGROUP BY
(leads to SIGSEGV). #29782 (Azat Khuzhin). - Condition in filter predicate could be lost after push-down optimisation. #29625 (Nikolai Kochetov).
- Fix null deference for
GROUP BY WITH TOTALS HAVING
(when the column fromHAVING
wasn’t selected). #29553 (Azat Khuzhin). - Fix connection timeouts (
send_timeout
/receive_timeout
). #29282 (Azat Khuzhin). - Fix possible
Table columns structure in ZooKeeper is different from local table structure
exception while recreating or creating new replicas ofReplicatedMergeTree
, when one of table columns have default expressions with case-insensitive functions. #29266 (Anton Popov). - Fix segfault while inserting into column with type LowCardinality(Nullable) in Avro input format. #29132 (Kruglov Pavel).
- Fix the number of threads used in
GLOBAL IN
subquery (it was executed in single threads since #19414 bugfix). #28997 (Nikolai Kochetov). - Fix invalid constant type conversion when nullable or lowcardinality primary key is used. #28636 (Amos Bird).
- Fix
ORDER BY ... WITH FILL
with setTO
andFROM
and no rows in result set. #30888 (Anton Popov). - Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee).
- Fix null pointer dereference in low cardinality data when deserializing LowCardinality data in the Native format. #33021 (Harry Lee).
- Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31656 (sunny).
- Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31337 (sunny).
- Fix shutdown of
AccessControlManager
. Now there can’t be reloading of the configuration after AccessControlManager has been destroyed. This PR fixes the flaky test test_user_directories/test.py::test_relative_path. #29951 (Vitaly Baranov).
Performance Improvements
- Improved performance of JSON and XML output formats. #31673 (alexey-milovidov).
Other Improvements
- Updated zoneinfo files to 2021c. #29925 (alexey-milovidov).
- Use real tmp file instead of predefined “rows_sources” for vertical merges. This avoids generating garbage directories in tmp disks. #28299 (Amos Bird).
- Use separate
clickhouse-bridge
group and user for bridge processes. Set oom_score_adj so the bridges will be first subjects for OOM killer. Set set maximum RSS to 1 GiB. Closes #23861. #25280 (Kseniia Sumarokova).
Changes compared to Community Build 21.3.20.1-lts
- Applying 21.3 customizations for 21.3.20 release #113 (Vasily Nemkov).
- Improvement: Retries on HTTP connection drops in S3 (ClickHouse#22988)
- Bug Fix: Fixed possible error ‘Cannot read from istream at offset 0’ when reading a file from DiskS3 (https://github.com/ClickHouse/ClickHouse/pull/24885)
- Improvement: Don’t silently ignore errors and don’t count delays in ReadBufferFromS3 (https://github.com/ClickHouse/ClickHouse/pull/27484)
- Bug Fix: Fixed a bug with accessing S3 buckets when the object URL contained ‘+’ sign (ClickHouse#23822)
- Not for changelog: fixed tests (ClickHouse #27176, #23976 #26249).
Test Results
21.3.17.3
Released 2021-10-14
Changes Since v21.3.17.2-lts
Changes since the Long Term Support release of ClickHouse v21.3.17.2-lts to Altinity Stable 21.3.17.3:
- Retries on HTTP connection drops in S3 (ClickHouse#22988).
- Added sleep with backoff between read retries from S3. (ClickHouse#24311).
- Fixed possible error
Cannot read from istream at offset 0
when reading a file from DiskS3 (ClickHouse#24885). - Don’t silently ignore errors and don’t count delays in ReadBufferFromS3 (ClickHouse#27484).
- Fixed a bug with accessing S3 buckets when the object URL contained
+
sign (ClickHouse#23822). - Other minor things/tests fixed:
V21.1 Build
Point release:
- Detailed release notes and upgrade instructions for version 21.1.7.41 - Read these first!
- 21.1.11.3 - This is the latest release of version 21.3. We recommend that you upgrade to this release. (But read the detailed release notes and upgrade instructions first.)
- 21.1.10.3
- 21.1.9.41
Major new features since the previous stable release 20.8.x
Released 2021-04-13
A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:
- Security features:
- LDAP external users directory, see an article in out blog a)
- AES Encryption functions, see an article in our blog for more detail a)
- Migration from OpenSSL to BoringSSL library
- Support SNI in https connections to remote resources
- Support SNI in ClickHouse server TLS endpoint a)
- Security context propagation in distributed queries
- MergeTree features:
- New TTL extension:
TTL RECOMPRESS
. ALTER UPDATE/DELETE IN PARTITION
for replicated MergeTree tables a)DETACH TABLE/VIEW PERMANENTLY
a)[OPTIMIZE DEDUPLICATE BY](https://clickhouse.tech/docs/en/sql-reference/statements/optimize/#by-expression)
– deduplicate MergeTree tables by a subset of columns a)- SimpleAggregateFunction in SummingMergeTree
- Option to disable merges for a cold storage in tiered storage configuration a)
- New TTL extension:
- Integrations:
- gRPC protocol
- zstd and xz compression for file-based engines
- EmbeddedRocksDB engine
- SQL compatibility:
UNION DISTINCT
(previously onlyUNION ALL
was supported). The default can be altered byunion_default_mode
setting.- Improved CTE compatibility
REPLACE TABLE
andCREATE OR REPLACE TABLE
DDL statements for Atomic database engine.
- Other:
- New Map data type (experimental)
- OpenTelemetry support (experimental)
- Column transformers in SELECT statements
As usual with ClickHouse, there are many performance and operational improvements in different server components.
a) Contributed by Altinity developers.
Backward Incompatible Changes
The following changes are backward incompatible and require user attention during an upgrade:
-
Atomic database engine is enabled by default. It does not affect existing databases but new databases will be created with
Engine = Atomic
. The engine can not be modified for the database once created. Database Atomic has been used for system tables since 20.5, and it is a good feature in the long term. We recommend disabling it for now, however, especially if you use some backup tools, including ClickHouse Backup 0.6.4 or earlier. The data layout on the storage has been changed. In order to disable it by default, add a following configuration section for a default profile:<yandex> <profiles> <default> <default_database_engine>Ordinary</default_database_engine> </default> </profiles> </yandex>
-
toUUID(N) is no longer supported. If there is a DEFAULT column with this expression ClickHouse won’t start.
-
Following functions where removed:
sumburConsistentHash, timeSeriesGroupSum, timeSeriesGroupRateSum
. -
avg
andavgWeighted
functions now always return Float64. In previous versions they returned Decimal for Decimal arguments. -
Accept user settings related to file formats (e.g.
format_csv_delimiter
) in the SETTINGS clause when creating a table that uses File engine, and use these settings in all INSERTs and SELECTs. Session level settings are ignored in this case.
Upgrade Notes
There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.
- Replication protocol has been changed in 20.10 in order to improve reliability of TTL merges. Replication between versions prior to 20.10 and 20.10+ is incompatible if ReplicatedMergeTree tables with TTL are used. See https://github.com/ClickHouse/ClickHouse/pull/14490 for more information.
- For a safe upgrade all replicas should be upgraded at once.
- Alternatively,
SYSTEM STOP TTL MERGES
should be used during the upgrade.
Other Important Changes
Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:
- Insert quorum behaviour has been changed.
insert_quorum_parallel
is enabled by default. It breaks sequential consistency and may have other side effects. We recommend disabling this feature if you are using quorum inserts on previous versions. optimize_on_insert
is enabled by default. This is a new feature that applies a logic of Replacing, Summing, Collapsing and AggregatingMergeTree on the inserted block. Unfortunately, it still has some issues so we recommend disabling this after upgrading.use_compact_format_in_distributed_parts_names
is enabled by default.input_format_null_as_default
is enabled by default.- Background fetches are now limited by
background_fetches_pool_size
setting. The default value is 3 that may be low in some cases. In previous versions the common background pool has been used for merges and fetches with the default size 16. - Compact MergeTree parts are enabled by default for parts below 10MB of size uncompressed. See
min_bytes_for_wide_part
setting.
Known issues in 21.1.7.1
Development team continues to improve the quality of the 21.1 release. Following issues still exist in the 21.1.7.1 version and may affect ClickHouse operation. Please inspect them carefully in order to decide if those are applicable to your applications:
- “Unexpected end of stream” errors may happen when loading text files (TSV and CSV formats) without line end at the end of file: https://github.com/ClickHouse/ClickHouse/issues/20244
- ‘/etc/init.d/clickhouse-server restart’ cannot restart ClickHouse server if shutdown did not complete in 3 seconds timeout:
https://github.com/ClickHouse/ClickHouse/issues/20214
The timeout can be increased withshutdown_wait_unfinished
server setting - Distributed queries may fail sometimes with ‘Unknown packet n from server’ error message. Disable
async_socket_for_remote
setting if you experience this issue: https://github.com/ClickHouse/ClickHouse/issues/21588
ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from the Altinity Stable Repository.
Please contact us at info@altinity.com if you experience any issues with the upgrade.
Appendix
New data types
DateTime32
(alias toDateTime
)Map
(Experimental)
New formats
JSONCompactStrings
JSONCompactStringsEachRow
JSONCompactStringsEachRowWithNamesAndTypes
JSONStrings
JSONStringsEachRow
JSONStringsEachRowWithProgress
LineAsString
ProtobufSingle
RawBLOB
New functions
- DateTime functions:
fromModifiedJulianDay, toModifiedJulianDay
fromModifiedJulianDayOrNull, toModifiedJulianDayOrNull
parseDateTime32BestEffort, parseDateTime32BestEffortOrNull, parseDateTime32BestEffortOrZero, toDateTime32
- String functions:
countMatches, countMatchesCaseInsensitive
countSubstrings, countSubstringsCaseInsensitive, countSubstringsCaseInsensitiveUTF8
isIPv4String, isIPv6String
- Functions for semi-duplicate search and strings proximity:
bitHammingDistance, tupleHammingDistance
ngramMinHash*, ngramSimHash*
- Encoding/Formatting:
encodeXMLComponent
formatReadableQuantity
formatReadableTimeDelta
- Array functions:
arrayAvg, arrayMax, arrayMin
mapPopulateSeries
- Machine-learning and statistics:
mannWhitneyUTest
rankCorr
hypot
studentTTest, welchTTest
- Encryption:
aes_decrypt_mysql, aes_encrypt_mysql
dencrypt, encrypt
- URL functions
cutToFirstSignificantSubdomainCustom, cutToFirstSignificantSubdomainCustomWithWWW, cutToFirstSignificantSubdomainWithWWW
firstSignificantSubdomainCustom
- Aggregate functions
- New combinator
-SimpleState
- New combinator
- Functions to work with new Map data type:
map
- Math/Statistics:
acosh, asinh, atan2, atanh, cosh, sinh
log1p
- Type related:
accurateCast
accurateCastOrNull
byteSize
reinterpretAsUUID
toUUIDOrNull
toUUIDOrZero
- Other:
farmFingerprint64
- Internal ClickHouse:
errorCodeToName
logTrace
tid
tcpPort
New table functions
null
view
New table engines
EmbeddedRocksDB
New metrics and events
system.metrics:
BackgroundFetchesPoolTask
BackgroundMessageBrokerSchedulePoolTask
MaxDDLEntryID
TablesToDropQueueSize
- REMOVED
MemoryTrackingForMerges
- REMOVED
MemoryTrackingInBackgroundBufferFlushSchedulePool
- REMOVED
MemoryTrackingInBackgroundDistributedSchedulePool
- REMOVED
MemoryTrackingInBackgroundMoveProcessingPool
- REMOVED
MemoryTrackingInBackgroundProcessingPool
- REMOVED
MemoryTrackingInBackgroundSchedulePool
system.asynchronous_metrics:
HTTPThreads
InterserverThreads
MySQLThreads
TCPThreads
TotalBytesOfMergeTreeTables
TotalPartsOfMergeTreeTables
TotalRowsOfMergeTreeTables
system.events:
RWLockReadersWaitMilliseconds
- REMOVED
CreatedWriteBufferOrdinary
New system tables
errors
replicated_fetches
replicated_merge_tree_settings
New columns in system tables
grants.access_type
merges.merge_type, merge_algorithm
parts.uuid, default_compression_codec, recompression_ttl_info.expression, recompression_ttl_info.min, recompression_ttl_info.max
privileges.privilege, parent_group
processes.forwarded_for
query_log.event_time_microseconds, query_start_time_microseconds, normalized_query_hash, query_kind, databases, tables, columns, forwarded_for
query_thread_log.event_time_microseconds, query_start_time_microseconds, normalized_query_hash, current_database, forwarded_for
quotas.keys
replication_queue.merge_type
storage_policies.prefer_not_to_merge
table_engines.supports_parallel_insert
trace_log.event_time_microseconds
user_directories.path, readonly, params
system.merge_tree_settings added/changed
Name | Old value | New value | Description |
---|---|---|---|
assign_part_uuids |
0 | Generate UUIDs for parts. Before enabling check that all replicas support the new format. | |
execute_merges_on_single_replica_time_threshold |
0 | When greater than zero only a single replica starts the merge immediately, others wait up to that amount of time to download the result instead of doing merges locally. If the chosen replica doesn’t finish the merge during that amount of time, fallback to standard behavior happens. | |
fsync_after_insert |
0 | Do fsync for every inserted part. Significantly decreases performance of inserts, not recommended to use with wide parts. | |
fsync_part_directory |
0 | Do fsync for part directory after all part operations (writes, renames, etc.). | |
in_memory_parts_insert_sync |
0 | If true insert of part with in-memory format will wait for fsync of WAL | |
max_compress_block_size |
0 | Compress the pending uncompressed data in a buffer if its size is larger or equal than the specified threshold. Block of data will be compressed even if the current granule is not finished. If this setting is not set, the corresponding global setting is used. | |
max_number_of_merges_with_ttl_in_pool |
2 | When there is more than a specified number of merges with TTL entries in the pool, do not assign a new merge with TTL. This is to leave free threads for regular merges and avoid “Too many parts” | |
max_partitions_to_read |
-1 | Limit the max number of partitions that can be accessed in one query. <= 0 means unlimited. This setting is the default that can be overridden by the query-level setting with the same name. | |
max_replicated_merges_with_ttl_in_queue |
1 | How many tasks of merging parts with TTL are allowed simultaneously in the ReplicatedMergeTree queue. | |
merge_with_recompression_ttl_timeout |
14400 | Minimal time in seconds, when merge with recompression TTL can be repeated. | |
merge_with_ttl_timeout |
86400 | 14400 | Minimal time in seconds, when merge with delete TTL can be repeated. |
min_bytes_for_wide_part |
0 | 10485760 | Minimal uncompressed size in bytes to create part in wide format instead of compact |
min_compress_block_size |
0 | When granule is written, compress the data in a buffer if the size of pending uncompressed data is larger or equal than the specified threshold. If this setting is not set, the corresponding global setting is used. | |
min_compressed_bytes_to_fsync_after_fetch |
0 | Minimal number of compressed bytes to do fsync for part after fetch (0 - disabled) | |
min_compressed_bytes_to_fsync_after_merge |
0 | Minimal number of compressed bytes to do fsync for part after merge (0 - disabled) | |
min_rows_to_fsync_after_merge |
0 | Minimal number of rows to do fsync for part after merge (0 - disabled) | |
remove_empty_parts |
1 | Remove empty parts after they were pruned by TTL, mutation, or collapsing merge algorithm | |
try_fetch_recompressed_part_timeout |
7200 | Recompression works slowly in most cases. We don’t start a merge with recompression until this timeout and try to fetch the recompressed part from the replica which assigned this merge with recompression. | |
write_ahead_log_bytes_to_fsync |
104857600 | Amount of bytes, accumulated in WAL to do fsync. | |
write_ahead_log_interval_ms_to_fsync |
100 | Interval in milliseconds after which fsync for WAL is being done. |
system.settings added/changed
Name | Old value | New value | Description |
---|---|---|---|
aggregate_functions_null_for_empty |
0 | Rewrite all aggregate functions in a query, adding -OrNull suffix to them | |
allow_experimental_cross_to_join_conversion |
1 | ||
allow_experimental_data_skipping_indices |
1 | ||
allow_experimental_low_cardinality_type |
1 | ||
allow_experimental_map_type |
0 | Allow data type Map | |
allow_experimental_multiple_joins_emulation |
1 | ||
allow_experimental_window_functions |
0 | Allow experimental window functions | |
asterisk_include_alias_columns |
0 | Include ALIAS columns for wildcard query | |
asterisk_include_materialized_columns |
0 | Include MATERIALIZED columns for wildcard query | |
async_socket_for_remote |
1 | Asynchronously read from socket executing remote query | |
background_fetches_pool_size |
3 | Number of threads performing background fetches for replicated tables. Only has meaning at server startup. | |
background_message_broker_schedule_pool_size |
16 | Number of threads performing background tasks for message streaming. Only has meaning at server startup. | |
compile |
0 | ||
database_atomic_wait_for_drop_and_detach_synchronously |
0 | When executing DROP or DETACH TABLE in Atomic database, wait for table data to be finally dropped or detached. | |
date_time_output_format |
simple | Method to write DateTime to text output. Possible values: ‘simple’, ‘iso’, ‘unix_timestamp’. | |
default_database_engine |
Ordinary | Atomic | Default database engine. |
do_not_merge_across_partitions_select_final |
0 | Merge parts only in one partition in select final | |
enable_global_with_statement |
0 | Propagate WITH statements to UNION queries and all subqueries | |
experimental_use_processors |
1 | ||
force_data_skipping_indices |
Comma separated list of strings or literals with the name of the data skipping indices that should be used during query execution, otherwise an exception will be thrown. | ||
force_optimize_skip_unused_shards_no_nested |
0 | ||
format_regexp_escaping_rule |
Escaped | Raw | Field escaping rule (for Regexp format) |
input_format_csv_arrays_as_nested_csv |
0 | When reading Array from CSV, expect that its elements were serialized in nested CSV and then put into string. Example: “[““Hello””, ““world””, ““42"””” TV""]". Braces around an array can be omitted. | |
input_format_csv_enum_as_number |
0 | Treat inserted enum values in CSV formats as enum indices | |
input_format_null_as_default |
0 | 1 | For text input formats initialize null fields with default values if data type of this field is not nullable |
input_format_tsv_enum_as_number |
0 | Treat inserted enum values in TSV formats as enum indices | |
insert_distributed_one_random_shard |
0 | If setting is enabled, inserting into distributed table will choose a random shard to write when there is no sharding key | |
insert_quorum_parallel |
1 | For quorum INSERT queries - enable to make parallel inserts without linearizability | |
limit |
0 | Limit on read rows from the most ’end’ result for select query, default 0 means no limit length | |
load_balancing_first_offset |
0 | Which replica to preferably send a query when FIRST_OR_RANDOM load balancing strategy is used. | |
log_queries_min_query_duration_ms |
0 | Minimal time for the query to run, to get to the query_log/query_thread_log. | |
mark_cache_min_lifetime |
0 | ||
max_bytes_to_read_leaf |
0 | Limit on read bytes (after decompression) on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node. | |
max_concurrent_queries_for_all_users |
0 | The maximum number of concurrent requests for all users. | |
max_partitions_to_read |
-1 | Limit the max number of partitions that can be accessed in one query. <= 0 means unlimited. | |
max_rows_to_read_leaf |
0 | Limit on read rows on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node. | |
merge_tree_uniform_read_distribution |
1 | ||
min_count_to_compile |
0 | ||
multiple_joins_rewriter_version |
2 | 0 | Obsolete setting, does nothing. Will be removed after 2021-03-31 |
mysql_datatypes_support_level |
Which MySQL types should be converted to corresponding ClickHouse types (rather than being represented as String). Can be empty or any combination of ‘decimal’ or ‘datetime64’. When empty MySQL’s DECIMAL and DATETIME/TIMESTAMP with non-zero precision are seen as String on ClickHouse’s side. | ||
offset |
0 | Offset on read rows from the most ’end’ result for select query | |
opentelemetry_start_trace_probability |
0 | Probability to start an OpenTelemetry trace for an incoming query. | |
optimize_move_functions_out_of_any |
1 | 0 | Move functions out of aggregate functions ‘any’, ‘anyLast’. |
optimize_on_insert |
1 | Do the same transformation for inserted block of data as if merge was done on this block. | |
optimize_skip_merged_partitions |
0 | Skip partitions with one part with level > 0 in optimize final | |
output_format_json_array_of_rows |
0 | Output a JSON array of all rows in JSONEachRow(Compact) format. | |
output_format_json_named_tuples_as_objects |
0 | Serialize named tuple columns as JSON objects. | |
output_format_parallel_formatting |
1 | Enable parallel formatting for some data formats. | |
output_format_pretty_row_numbers |
0 | Add row numbers before each row for pretty output format | |
output_format_tsv_null_representation |
Custom NULL representation in TSV format | ||
partial_merge_join |
0 | ||
read_backoff_min_concurrency |
1 | Settings to try keeping the minimal number of threads in case of slow reads. | |
read_overflow_mode_leaf |
throw | What to do when the leaf limit is exceeded. | |
remerge_sort_lowered_memory_bytes_ratio |
2 | If memory usage after remerge does not reduced by this ratio, remerge will be disabled. | |
s3_max_redirects |
10 | Max number of S3 redirects hops allowed. | |
s3_max_single_part_upload_size |
67108864 | The maximum size of object to upload using singlepart upload to S3. | |
special_sort |
not_specified | ||
system_events_show_zero_values |
0 | Include all metrics, even with zero values | |
union_default_mode |
Set default Union Mode in SelectWithUnion query. Possible values: empty string, ‘ALL’, ‘DISTINCT’. If empty, query without Union Mode will throw exception. | ||
use_antlr_parser |
0 | Parse incoming queries using ANTLR-generated experimental parser | |
use_compact_format_in_distributed_parts_names |
0 | 1 | Changes format of directories names for distributed table insert parts. |
Also, please refer to the release notes from the development team available at the following URLs:
- 20.9.2 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v209220-2020-09-22
- 20.10.3 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v2010330-2020-10-28
- 20.11.2 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v201121-2020-11-11
- 20.12.3 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v201233-stable-2020-12-13
- 21.1.2 release notes: https://clickhouse.com/docs/en/whats-new/changelog/2021#clickhouse-release-v211215-stable-2021-01-18
21.1.11.3
Released 2022-06-01
This is a security update to 21.1.10.3 release. It backports fixes for several CVEs into 21.1 version:
- CVE-2021-43304
- CVE-2021-43305
- CVE-2021-42387
- CVE-2021-42388
- CVE-2021-42389
- CVE-2021-42390
- CVE-2021-42391
As we discussed in a recent Altinity Blog article, JFrog discovered these CVEs and reported them to the Yandex team in 2021. The CVEs have been already included in 21.3.19 and 21.8.11 and Altinity.Stable and ClickHouse builds and all ClickHouse builds after 21.10.
We have now backported the fixes to 21.1 as part of our long-term maintenance policy for Altinity Stable builds. We recommend all users running ClickHouse 21.1 upgrade to this new release.
- Deb/RPMs: https://builds.altinity.cloud
- Docker Image:
altinity/clickhouse-server:21.1.11.3.altinitystable
21.1.10.3
Released 2021-06-16
Changes between 21.1.10.3 and 21.1.9.41
background_fetches_pool_size
is set to 8. This is better for production usage with frequent small insertions or slow ZooKeeper clusters.- Fixed the error ‘
Cannot find column in ActionsDAG result'
, which may happen if subquery usesuntuple
. https://github.com/ClickHouse/ClickHouse/issues/22290 - Fixed
quantile(s)TDigest
that could produce non-accurate results in some edge cases: https://github.com/ClickHouse/ClickHouse/issues/22408 - Fixed a bug with accessing S3 buckets when the object URL contained ‘
+
’ sign: https://github.com/clickHouse/clickHouse/pull/23822
The complete diff between Altinity and community versions can also be viewed at github:
Changes between ClickHouse Altinity Stable Release 21.1.9.41 and ClickHouse Altinity Stable Release 21.1.7.1
-
Fixed a bug with distributed queries that could sometimes fail with ‘
Unknown packet n from server
’ error message: https://github.com/ClickHouse/ClickHouse/issues/21588 -
Fixed several issues with mutations handling (e.g. https://github.com/ClickHouse/ClickHouse/issues/22013)
-
Fix a rare bug when quorum insert with i
insert_quorum_parallel=1
is not really a “quorum” because of deduplication -
Fixed a possible buffer overflow in token bloom filter index: https://github.com/ClickHouse/ClickHouse/issues/19233
-
Fixed a bug when nullable types sometimes could not be retrieved: https://github.com/ClickHouse/ClickHouse/issues/21116
-
Fixed a bug with replicated tables metadata not properly cleaned from the non-primary ZooKeeper instance: https://github.com/ClickHouse/ClickHouse/issues/21054
-
Added
parseDateTimeBestEffortOrZero
to allow for empty values, whileparseDateTimeBestEffort
will not work with empty values. For example:SELECT parseDateTimeBestEffort('') Query id: 1c60717e-116b-445f-b675-7b9cac35c262 0 rows in set. Elapsed: 0.003 sec. Received exception from server (version 21.11.1): Code: 41. DB::Exception: Received from localhost:9000. DB::Exception: Cannot read DateTime: neither Date nor Time was parsed successfully: While processing parseDateTimeBestEffort(''). (CANNOT_PARSE_DATETIME) SELECT parseDateTimeBestEffortOrZero('') Query id: 484baec9-6d00-4c66-a792-85a270d6f2f2 ┌─parseDateTimeBestEffortOrZero('')─┐ │ 1970-01-01 03:00:00 │ └───────────────────────────────────┘
ClickHouse Altinity Stable packages for this release can be found at:
- Deb packages: https://packagecloud.io/Altinity/altinity-stable
- RPM packages: https://packagecloud.io/Altinity/altinity-stable
- Docker image: altinity/clickhouse-server:21.1.10.3.altinity_stable
21.1.9.41
Released 2021-04-13
Changes between 21.1.9.41 and 21.1.7.1
- Fixed a bug with distributed queries that could sometimes fail with ‘
Unknown packet n from server
’ error message: https://github.com/ClickHouse/ClickHouse/issues/21588 - Fixed several issues with mutations handling (e.g. https://github.com/ClickHouse/ClickHouse/issues/22013)
- Fix a rare bug when quorum insert with i
insert_quorum_parallel=1
is not really a “quorum” because of deduplication - Fixed a possible buffer overflow in token bloom filter index: https://github.com/ClickHouse/ClickHouse/issues/19233
- Fixed a bug when nullable types sometimes could not be retrieved: https://github.com/ClickHouse/ClickHouse/issues/21116
- Fixed a bug with replicated tables metadata not properly cleaned from the non-primary ZooKeeper instance: https://github.com/ClickHouse/ClickHouse/issues/21054
V20.8 Release
Point releases:
- V20.8.7.15 - 2020-12-02 - Main release notes - Read these first!
- V20.8.12.2 - 2021-02-03 - Latest release
- V20.8.11.17 - 2020-12-25
Altinity Stable Release for ClickHouse 20.8.7.15
Released 2020-12-02
Description
This release provides several changes and new features from the previous release.
Major New Features
- Security features:
- RBAC and SQL management for users, roles, grants etc.
- LDAP authentication
- Kerberos authentication in Kafka Engine
- Dictionary features:
direct
andssd_cache
layouts- Redis, MongoDB, Cassandra sources
- Automatic query rewrite for joins (see an example here)
- Experimental polygon dictionaries
- MergeTree features:
- S3 disk for MergeTree tables! It is still an experimental feature, see our article for more detail.
- New TTL types:
TTL DELETE WHERE
andTTL GROUP BY
. - In-memory MergeTree parts
- Integrations:
- PostgreSQL wire protocol
- RabbitMQ and MongoDB storage engines
- Tencent Cloud Object Storage (COS)
- Highly experimental MaterializeMySQL engine that implements MySQL replica in ClickHouse
- SQL compatibility:
- New Int128, (U)Int256, Decimal256 extended precision data types
- Aliases for standard SQL types
EXPLAIN
statement!- Merge join improvements
- Custom HTTP handlers
clickhouse-copier
underwent extensive updates and improvements
Upgrade Notes
Backward Incompatible Changes
The following changes are backward incompatible and require user attention during an upgrade:
- Aggregate functions states with Nullable arguments may produce different / incompatible types.
- Gorilla, Delta and DoubleDelta codecs can not be used anymore on data types of variable size (like strings).
- System tables (e.g.
system.query_log, system.trace_log, system.metric_log
) are using compact data part format for parts smaller than 10 MiB in size (this is almost always the case). Compact data part format is supported since version 20.3. - WARNING: If you have to downgrade to version prior 20.3, you should manually delete table data for system logs in
/var/lib/clickhouse/data/system/
. - The setting
input_format_with_names_use_header
is enabled by default. It will affect parsing of input formats -WithNames and -WithNamesAndTypes. - Deprecate special printing of zero Date/DateTime values as
'0000-00-00
’ and'0000-00-00 00:00:00
’. Now it is printed as ‘1970-01-01
’ and'1970-01-01 00:00:00
’ respectively.
There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended. Note the following:
- 20.3 two-level aggregation is not compatible with 20.4+.
Data will not be fully aggregated for queries that are processed using the two-level aggregation algorithm. This algorithm should be disabled before upgrading if the risk is high in your environment. Seegroup_by_two_level_threshold
andgroup_by_two_level_threshold_bytes
settings. zstd
library has been updated in 20.5. While it is not a problem for ClickHouse in general, it may result in inconsistent parts when several replicas merge parts independently, and will force ClickHouse to download merged parts to make sure they are byte-identical (which will lead to extra traffic between nodes). The first node to complete the merge will register the part in ZooKeeper, and the other nodes will download the part if their checksums are different. There will be no data loss; conflicts will disappear once all replicas are upgraded.- The following settings lead to incompatibility in distributed queries when only a subset of shards are upgraded and others are not:
optimize_move_functions_out_of_any
optimize_arithmetic_operations_in_aggregate_functions
optimize_injective_functions_inside_uniq
optimize_if_transform_strings_to_enum
- When pre-20.5 and 20.5+ versions run as replicas “
Part ... intersects previous part"
errors are possible due to change in leadership selection protocol. If you need to run pre-20.5 and 20.5+ versions in the same cluster make sure the old version can not become a leader. This can be configured viareplicated_can_become_leader
merge tree setting globally or on a table level.
Changes
- All replicas are now ’leaders’. This allows multiple replicas to assign merges, mutations, partition drop, move and replace concurrently. Now
system.replicas.is_leader
is 1 for all tables on all nodes. If you rely on this value for some operations, your processes must be revised. The LeaderElection and LeaderReplica metrics were removed. - New setting
max_server_memory_usage
limits total memory usage of the server. The settingmax_memory_usage_for_all_queries
is now obsolete and does nothing. You might see an exception ‘Memory limit (total) exceeded‘.
Increasing the limit requires a restart. - The
log_queries
setting is now enabled by default. You might want to disable this setting for some profiles if you don’t want their queries logged into thesystem.query_log
table. - Several new optimizations are enabled by default. While they typically improve performance sometimes regressions are possible in corner cases:
optimize_aggregators_of_group_by_keys
optimize_arithmetic_operations_in_aggregate_functions
optimize_duplicate_order_by_and_distinct
optimize_group_by_function_keys
optimize_injective_functions_inside_uniq
optimize_move_functions_out_of_any \ optimize_monotonous_functions_in_order_by
optimize_redundant_functions_in_order_by
optimize_trivial_insert_select
partial_merge_join_optimizations
New data types
- All standard SQL data types, like
BIGINT
,VARCHAR
, etc. are mapped to ClickHouse data types. See thesystem.data_type_families
table. - Experimental data types
Point, Ring, Polygon, MultiPolygon
Int128, Int256, UInt256, Decimal256
extended precision data types
New formats
Arrow
ArrowStream
JSONAsString
– this one allows to store the full JSON row unparsed and process later using ClickHouseJSONExtract
functionsMarkdown
MsgPack
PosgreSQLWire
Regexp -
allows to parse any non-standard textish formats (for example logs) by applying regular expression to every line.
New functions
- DateTime functions:
fromUnixTimestamp / FROM_UNIXTIME
dateTrunc / date_trunc
fromUnixTimestamp64Micro
fromUnixTimestamp64Milli
fromUnixTimestamp64Nano
toUnixTimestamp64Micro
toUnixTimestamp64Milli
toUnixTimestamp64Nano
- String functions:
extractGroups, extractAllGroupsHorizontal, extractAllGroupsVertical
(alias toextractAllGroups
)Ilike, notILike
(also new SQL operatorILIKE
has been added)
- Array functions:
arrayReduceInRanges
hasSubstr
- Machine-learning and statistics:
arrayAUC --
area Under the ROC CurvebayesAB --
bayesian A/B Testing CalculatormedianExactHigh, medianExactLow
quantileExactHigh, quantileExactLow, quantilesExactHigh, quantilesExactLow
- JSON
JSONExtractKeysAndValuesRaw
- URL functions
port
netloc
- Aggregate functions
-Distinct
– new aggregate function combinatorinitializeAggregation
– allow initialization of AggregateFunction state
- Functions to work with key-value pairs (aka maps):
mapAdd, mapSubtract, maxMap, minMap
(extending the API of sumMap).SimpleAggregateFunction
now supportsminMap, maxMap, sumMap.
- The new data type Map is currently in development that should make it even more intuitive.
- MySQL integration
DATABASE
(alias for currentDatabase)globalVariable
(stub)
- New types related:
toDecimal256, toDecimal256OrNull, toDecimal256OrZero
toInt128, toInt128OrNull, toInt128OrZero
toInt256, toInt256OrNull, toInt256OrZero
toUInt256, toUInt256OrNull, toUInt256OrZero
- engine=Join
joinGetOrNull
- Random:
fuzzBits (used for CI)
rand32 (alias for rand)
randomFixedString, randomString, randomStringUTF8
- Serialize columns to some text format
formatRow, formatRowNoNewline
- Settings & custom settings:
getSetting
- Check types
isDecimalOverflow
defaultValueOfTypeName
isZeroOrNull
- Helper functions to analyze
query_log
normalizeQuery, normalizedQueryHash
- Other:
countDigits
mod
(alias for modulo)
- Special (used in CI / by ClickHouse developers):
hasThreadFuzzer
buildId
- Related to internals of execution of of IN / GLOBAL IN operator (also see transform_null_in setting)
notNullIn, notNullInIgnoreSet, nullIn, nullInIgnoreSet, inIgnoreSet, notInIgnoreSet
globalNotNullIn, globalNotNullInIgnoreSet, globalNullIn, globalNullInIgnoreSet, globalInIgnoreSet, globalNotInIgnoreSet
New table functions
cosn
– integration with Tencent Cloud Object Storage (COS)
New table engines
MongoDB
RabbitMQ
COSN
New metrics and events
system.metrics:
- BackgroundBufferFlushSchedulePoolTask
- BackgroundDistributedSchedulePoolTask
- MemoryTrackingInBackgroundBufferFlushSchedulePool
- MemoryTrackingInBackgroundDistributedSchedulePool
- PostgreSQLConnection
system.events:
- OSCPUVirtualTimeMicroseconds
- OSCPUWaitMicroseconds
- OSReadBytes, OSReadChars
- OSWriteBytes, OSWriteChars
- QueryTimeMicroseconds
New system tables (mostly for RBAC introspection)
- current_roles
- distribution_queue
- enabled_roles
- grants
- licenses
- privileges
- quota_limits
- quotas_usage
- row_policies
- role_grants
- roles
- settings_profile_elements
- settings_profiles
- time_zones
- user_directories
- users
New columns in system tables
- columns.position, parts_columns.position
- databases.uuid, dictionaries.uuid, tables.uuid
- disks.type
- merge_tree_settings.type, settings.type
- parts: delete_ttl_info_min, delete_ttl_info_max, move_ttl_info_min, move_ttl_info_max, move_ttl_info.expression
- query_log.current_database
- storage_policies.volume_type
- tables: total_rows, total_bytes, lifetime_rows, lifetime_bytes
- trace_log: trace_type, size, timestamp_ns
system.merge_tree_settings added/changed
Setting | Old value | New value | Description |
---|---|---|---|
allow_nullable_key |
0 | Allow Nullable types as primary keys. | |
always_fetch_merged_part |
0 | If true, replicas never merge parts and always download merged parts from other replicas. | |
disable_background_merges |
0 | REMOVED | |
enable_mixed_granularity_parts |
0 | 1 | Enable parts with adaptive and non-adaptive granularity at the same time |
in_memory_parts_enable_wal |
1 | Whether to write blocks in Native format to write-ahead-log before creation in-memory part | |
lock_acquire_timeout_for_background_operations |
120 | For background operations like merges, mutations etc. How many seconds before failing to acquire table locks. | |
max_part_loading_threads |
auto | max_part_loading_threads | |
max_part_removal_threads |
auto | The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16). | |
max_replicated_logs_to_keep |
10000 | 1000 | How many records may be in log, if there is an inactive replica. |
min_replicated_logs_to_keep |
100 | 10 | Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn’t affect work of tables: used only to diagnose ZooKeeper log before cleaning. |
min_bytes_for_compact_part |
0 | Minimal uncompressed size in bytes to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts. | |
min_rows_for_compact_part |
0 | Minimal number of rows to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts. | |
min_index_granularity_bytes |
1024 | Minimum amount of bytes in single granule. | |
min_relative_delay_to_measure |
120 | Calculate relative replica delay only if absolute delay is not less than this value. | |
write_ahead_log_max_bytes |
1073741824 | Rotate WAL, if it exceeds that amount of bytes |
system.settings added/changed
Setting | Old value | New value | Description |
---|---|---|---|
allow_experimental_bigint_types |
0 | Allow Int128, Int256, UInt256 and Decimal256 types | |
allow_experimental_database_materialize_mysql |
1 | Allow database creation with Engine=MaterializeMySQL(…) (Highly experimental yet) | |
allow_experimental_geo_types |
0 | Allow geo data types such as Point, Ring, Polygon, MultiPolygon | |
allow_non_metadata_alters |
1 | Allow to execute alters which affects table’s metadata and data on disk. | |
allow_push_predicate_when_subquery_contains_with |
1 | Allows push predicate when subquery contains WITH clause | |
allow_suspicious_codecs |
0 | If it is set to true, allow specifying meaningless compression codecs. | |
alter_partition_verbose_result |
0 | Output information about affected parts. Currently works only for FREEZE and ATTACH commands. | |
background_buffer_flush_schedule_pool_size |
16 | Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup. | |
background_distributed_schedule_pool_size |
16 | Number of threads performing background tasks for distributed sends. Only has meaning at server startup. | |
cast_keep_nullable |
0 | CAST operator keep Nullable for result data type | |
data_type_default_nullable |
0 | Data types without NULL or NOT NULL will make Nullable | |
default_database_engine |
Ordinary | Default database engine | |
distributed_replica_max_ignored_errors |
0 | Number of errors that will be ignored while choosing replicas | |
force_optimize_skip_unused_shards_nesting |
0 | Same as force_optimize_skip_unused_shards, but accept nesting level until which it will work | |
format_regexp |
Regular expression (for Regexp format) | ||
format_regexp_escaping_rule |
Escaped | Field escaping rule (for Regexp format) | |
format_regexp_skip_unmatched |
0 | Skip lines unmatched by regular expression (for Regexp format) | |
function_implementation |
Choose function implementation for specific target or variant (experimental). If empty, enable all of them. | ||
input_format_avro_allow_missing_fields |
0 | For Avro/AvroConfluent format: when field is not found in schema use default value instead of error | |
input_format_with_names_use_header |
0 | 1 | For TSVWithNames and CSVWithNames input formats this controls whether the format parser is to assume that column data appear in the input exactly as they are specified in the header. |
insert_in_memory_parts_timeout |
600000 | ||
join_on_disk_max_files_to_merge |
64 | For MergeJoin on disk, set how many files are allowed to sort simultaneously. The larger the value the more memory is used and less disk I/O needed. Minimum is 2. | |
lock_acquire_timeout |
120 | How long locking request should wait before failing | |
log_queries_min_type |
QUERY_START | Minimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING. | |
materialize_ttl_after_modify |
1 | Apply TTL for old data, after ALTER MODIFY TTL query | |
max_block_size |
65536 | 65505 | Maximum block size for reading |
max_final_threads |
16 | The maximum number of threads to read from the table with FINAL. | |
max_insert_block_size |
1048576 | 1048545 | The maximum block size for insertion, if we control the creation of blocks for insertion |
max_joined_block_size_rows |
65536 | 65505 | Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited. |
max_untracked_memory |
4194304 | Small allocations and deallocations are grouped in thread local variables and tracked or profiled only when the amount (in absolute value) becomes larger than specified value. If the value is higher than ‘memory_profiler_step ’ it will be effectively lowered to ‘memory_profiler_step ’. |
|
memory_profiler_sample_probability |
0 | Collect random allocations and deallocations and write them into system.trace_log with 'MemorySample' trace_type. The probability is for every alloc/free regardless to the size of the allocation. Note that sampling happens only when the amount of untracked memory exceeds ‘max_untracked_memory ’. You may want to set ‘max_untracked_memory ’ to 0 for extra fine grained sampling. |
|
metrics_perf_events_enabled |
0 | If enabled, some of the perf events will be measured throughout queries’ execution. | |
metrics_perf_events_list |
Comma separated list of perf metrics that will be measured throughout queries’ execution. Empty means all events. | ||
min_chunk_bytes_for_parallel_parsing |
1048576 | 10485760 | The minimum chunk size in bytes, which each thread will parse in parallel. |
min_insert_block_size_bytes |
268435456 | 268427520 | Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough. |
min_insert_block_size_bytes_for_materialized_views |
0 | Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes) | |
min_insert_block_size_rows |
1048576 | 1048545 | Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough. |
min_insert_block_size_rows_for_materialized_views |
0 | Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows) | |
multiple_joins_rewriter_version |
2 | 1 or 2. Second rewriter version knows about table columns and keeps not clashing names as is. | |
optimize_aggregation_in_order |
0 | Enable GROUP BY optimization for aggregating data in corresponding order in MergeTree tables. | |
optimize_aggregators_of_group_by_keys |
1 | Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section | |
optimize_arithmetic_operations_in_aggregate_functions |
1 | Move arithmetic operations out of aggregation functions | |
optimize_distributed_group_by_sharding_key |
0 | Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server) | |
optimize_duplicate_order_by_and_distinct |
1 | Remove duplicate ORDER BY and DISTINCT if it’s possible | |
optimize_group_by_function_keys |
1 | Eliminates functions of other keys in GROUP BY section | |
optimize_if_chain_to_multiif |
0 | Replace if(cond1, then1, if(cond2, …)) chains to multiIf. Currently it’s not beneficial for numeric types. | |
optimize_if_transform_strings_to_enum |
0 | Replaces string-type arguments in If and Transform to enum. Disabled by default cause it could make inconsistent change in distributed query that would lead to its fail | |
optimize_injective_functions_inside_uniq |
1 | Delete injective functions of one argument inside uniq*() functions | |
optimize_monotonous_functions_in_order_by |
1 | Replace monotonous function with its argument in ORDER BY | |
optimize_move_functions_out_of_any |
1 | Move functions out of aggregate functions ‘any’, ‘anyLast’ | |
optimize_redundant_functions_in_order_by |
1 | Remove functions from ORDER BY if its argument is also in ORDER BY | |
optimize_skip_unused_shards_nesting |
0 | Same as optimize_skip_unused_shards, but accept nesting level until which it will work | |
optimize_trivial_insert_select |
1 | Optimize trivial ‘INSERT INTO table SELECT … FROM TABLES’ query | |
output_format_enable_streaming |
0 | Enable streaming in output formats that supports it | |
output_format_pretty_grid_charset |
UTF-8 | Charset for printing grid borders. Available charsets: ASCII, UTF-8 (default) | |
output_format_pretty_max_value_width |
10000 | Maximum width of value to display in Pretty formats. If greater - it will be cut. | |
parallel_distributed_insert_select |
0 | Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard | |
partial_merge_join_left_table_buffer_bytes |
32000000 | If not 0, group left table blocks in bigger ones for the left-side table in partial merge join. It uses up to 2x of specified memory per joining thread. The current version works only with ‘partial_merge_join_optimizations = 1 ’. |
|
partial_merge_join_optimizations |
0 | 1 | Enable optimizations in partial merge join |
partial_merge_join_rows_in_right_blocks |
10000 | 65536 | Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk. |
rabbitmq_max_wait_ms |
5000 | The wait time for reading from RabbitMQ before retry. | |
read_in_order_two_level_merge_threshold |
100 | Minimal number of parts to read to run preliminary merge step during multithread reading in order of primary key. | |
send_logs_level |
none | fatal | Send server text logs with specified minimum level to client. Valid values: ’trace’, ‘debug’, ‘information’, ‘warning’, ’error’, ‘fatal’, ’none’ |
show_table_uuid_in_table_create_query_if_not_nil |
0 | For tables in databases with Engine=Atomic show UUID of the table in its CREATE query. | |
temporary_files_codec |
LZ4 | Set compression codec for temporary files (sort and join on disk). I.e. LZ4, NONE. | |
transform_null_in |
0 | If enabled, NULL values will be matched with ‘IN’ operator as if they are considered equal. | |
validate_polygons |
1 | Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result. |
References
ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from the Altinity Stable Repository.
- https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v20429-2020-05-12
- https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v20527-stable-2020-07-02
- https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v206328-stable
- https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v207230-stable-2020-08-31
- https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v20823-stable-2020-09-08
Altinity Stable Release for ClickHouse 20.8.12.2
Released 2021-02-03
Description
Bug fixes.
Bug Fixes
- Fixed a rare bug that may result in corrupted data when merging from wide to compact parts. We recommend upgrading to this release all 20.8 users who are using compact parts.
Altinity Stable Release for ClickHouse 20.8.11.17
Released 2020-12-25
Description
This release provides bug fixes and general stability improvements.
V20.3 Release
Point releases:
- V20.3.12.112 - 2020-06-24- Main release notes - Read these first!
- V20.3.19.4 - 2020-09-23 - Latest release
Altinity Stable Release for ClickHouse 20.3.12.112
Released 2020-06-24
Description
This release is a significant step forward since the previous Altinity Stable release 19.16.19.85. It includes 1203 pull requests from 171 contributors.
Major New Features
- TTL moves. See the Putting Things Where They Belong Using New TTL Moves article. While we consider this feature to be production ready, we have a roadmap to further improve it. TTL moves have been developed by Altinity.
- Dictionary DDLs that many ClickHouse users dreamed of! See the ClickHouse Dictionaries: Reloaded! article that highlights this feature.
- New DateTime64 datatype with configurable precision up to nanoseconds. The feature is in beta state and more functions for usability will follow in next releases. It has also been developed by Altinity.
- Joins have been improved a lot, including SEMI/ANTI JOIN, experimental merge join and other changes. See the join_algorithm setting below for more info on merge joins.
- A new compact format for MergeTree tables that store all columns in one file. It improves performance of small inserts. The usual format where every column is stored separately is now called “wide.” Compact format is disabled by default. See the min_bytes_for_wide_part and min_rows_for_wide_part settings.
- A built-in Prometheus exporter endpoint for ClickHouse monitoring statistics
- Porting some functionality of the H3 library — A Hexagonal Hierarchical Geospatial Indexing System
- ALTER MODIFY/DROP are currently implemented as mutations for ReplicatedMergeTree* engine family. Now ALTER commands block only at the metadata update stage, and don’t block after that.
- Import/export gzip-compressed files directly for file based storage engines and table functions: File, URL, HDFS and S3.
- Server improvements across different server functions. For example, performance of the Kafka Engine has been improved, parallel
INSERT
is now possible (see max_insert_threads setting), etc.
Upgrade Notes
- Replication fetch protocol has been changed in this release. If only one replica is upgraded in a shard, replication may get stuck with
unknown protocol version
error until all replicas are upgraded. 19.16.17.80 and 19.16.19.85 contain a special compatibility fix that allows smooth upgrade without replication downtime. - If enable_optimize_predicate_expression is turned on, it may result in incorrect data when only part of the cluster is upgraded. Please turn it off before the upgrade, and turn it back on afterwards.
- There were some optimizations that are incompatible between versions, so we recommend disabling enable_scalar_subquery_optimization before the upgrade. This setting turns off new scalar subquery optimizations and was backported specifically in order to facilitate smooth upgrades.
- There were some optimizations around distributed query execution that may result in incorrect data if part of the cluster is upgraded. We recommend disabling distributed_aggregation_memory_efficient while upgrading, and turn it on afterwards.
Backward Incompatible Changes
The following changes are backward incompatible and require user attention.
- ALTER on ReplicatedMergeTree is not compatible with previous versions.20.3 creates a different metadata structure in ZooKeeper for ALTERs. Earlier versions do not understand the format and cannot proceed with their replication queue.
- The format of replication log entries for mutation commands has changed. You have to wait for old mutations to process before installing the new version.
- 20.3 requires an alias for every subquery participating in a join by default. Set
joined_subquery_requires_alias=0
in order to keep the previous behavior. - ANY JOIN logic has been changed. To upgrade without changes in behavior, you need to add
SETTINGS any_join_distinct_right_table_keys = 1
to Engine Join tables metadata or recreate these tables after upgrade. - Functions indexHint, findClusterValue, findClusterIndex were removed
- Settings merge_tree_uniform_read_distribution, allow_experimental_cross_to_join_conversion, allow_experimental_multiple_joins_emulation are deprecated and ignored
Other Important Notes
- If you encounter
[Argument at index 0 for function dictGetDateTime must be a constant](https://github.com/ClickHouse/ClickHouse/issues/7798)
or a similar error message after upgrading, setenable_early_constant_folding=0
. - The new release adds a parallel parsing of input formats. While it should improve the performance of data loading in text formats, sometimes it may result in a slower load. Set
input_format_parallel_parsing=0
if you experience insert performance degradation. - The check that a query is performed fast enough is enabled by default. Queries can get an exception like this:
DB::Exception: Estimated query execution time (N seconds) is too long. Maximum: 12334.
. Settimeout_before_checking_execution_speed=0
to fix this problem. - Mutations containing non-deterministic functions, e.g. dictGet or joinGet are disabled by default. Set
allow_nondeterministic_mutations=1
to enable.
Also note that this release has a new query pipeline enabled by default (codename — Processors). This was a significant internal refactoring, and we believe all issues have been fixed. However, in the rare case that you face some weird query behavior or performance degradation, try disabling experimental_use_processors
and check if the problem goes away.
Changes
New formats
- Avro
- AvroConfluent
- JSONCompactEachRow
- JSONCompactEachRowWithNamesAndTypes
New or Improved Functions
- CRC32IEEE, CRC64
- New JSON functions.
- isValidJSON
- JSONExtractArrayRaw — very useful for parsing nested JSON structures
- New array functions.
- arrayCompact
- arrayFill
- arrayReverseFill
- arrayZip
- New geo functions.
- geoDistance
- greatCircleDistance
- H3 functions.
- h3EdgeAngle
- h3EdgeLengthM
- h3GetBaseCell
- h3GetResolution
- h3HexAreaM2
- h3IndexesAreNeighbors
- h3IsValid
- h3ToChildren
- h3ToParent
- h3ToString
- h3kRing
- stringToH3
- Functions for DateTime64.
- now64
- parseDateTime64BestEffort, parseDateTime64BestEffortOrNull, parseDateTime64BestEffortOrZero
- toDateTime64, toDateTime64OrNull, toDateTime64OrZero
- New aggregate functions:
- avgWeighted
- categoricalInformationValue
- groupArraySample
- Other.
- bitCount
- ifNotFinite.
- isConstant
- javaHashUTF16LE
- moduloOrZero
- randomPrintableASCII
- roundBankers
New table functions
- clusterAllReplicas — similar to cluster function but queries all replicas in the cluster
- generateRandom — allows users to generate random dataset with defined structure
- zeros, zeros_mt
New metrics and events
system.metrics:
- BackgroundMovePoolTask
- CacheDictionaryUpdateQueueBatches
- CacheDictionaryUpdateQueueKeys
- MemoryTrackingInBackgroundMoveProcessingPool
- MySQLConnection
- SendScalars
New system tables
metric_log
— stores system.metrics and system.events historyquotas
,quota_usage
— introspection to quotesrow_policies
— introspection to row security policiestrace_log
,stack_trace
— for low-level debugging purposezeros
,zeros_mt
— zerogenerators
, like system.numbers, but faster and returns zeros
New columns in system tables
mutations
: parts_to_do_namesparts
,parts_columns
: part_typeprocesses
: thread_ids replaces old columns thread_numbers and os_thread_idsquery_log
: exception_code, thread_ids replaces old columns thread_numbers and os_thread_idsreplicas
: zookeeper_exceptionsettings
: min, max, readonlytable_engines
: supports_settings, supports_skipping_indices, supports_sort_order, supports_ttl, supports_replication, supports_deduplication
system.merge_tree_settings added/changed
Settings | Old value | New value | Description |
---|---|---|---|
disable_background_merges | 0 | Disable background merges | |
merge_max_block_size | 8192 | How many rows in blocks should be formed for merge operations | |
min_bytes_for_wide_part | 0 | Minimal uncompressed size in bytes to create part in wide format instead of compact | |
min_rows_for_wide_part | 0 | Minimal number of rows to create part in wide format instead of compact | |
use_minimalistic_part_header_in_zookeeper | 0 | 1 | Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes. This setting was available for a year already, and many users have it enabled. |
system.settings added/changed
Setting | Old value | New value | Description |
---|---|---|---|
allow_experimental_alter_materialized_view_structure | 0 | Allow atomic alter on Materialized views. Work in progress | |
allow_experimental_data_skipping_indices | 0 | 1 | Data skipping indices are now enabled by default |
allow_nondeterministic_mutations | 0 | Allow non-deterministic functions in ALTER UPDATE/ALTER DELETE statements, e.g. with dictGet() functions | |
background_move_pool_size | 8 | Number of threads performing background moves for tables. Only has meaning at server startup | |
default_max_bytes_in_join | 100000000 | 1000000000 | Maximum size of right-side table if limit is required but max_bytes_in_join is not set |
enable_early_constant_folding | 0 | 1 | Enable query optimization where we analyze function and subqueries results and rewrite query if there are constants there |
enable_scalar_subquery_optimization | 0 | 1 | If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once |
experimental_use_processors | 0 | 1 | Processors pipeline is now enabled by default |
force_optimize_skip_unused_shards | 1 | Throw an exception if unused shards cannot be skipped (1 – throw only if the table has the sharding key, 2 – always throw) | |
format_avro_schema_registry_url | For AvroConfluent format: Confluent Schema Registry URL | ||
input_format_parallel_parsing | 1 | Enable parallel parsing for some data formats | |
input_format_values_deduce_templates_of_expressions | 0 | 1 | For Values format: if the field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows |
join_algorithm | hash | Specify join algorithm: auto , hash , partial_merge , prefer_partial_merge . auto tries to change HashJoin to MergeJoin on the fly to avoid out of memory |
|
joined_subquery_requires_alias | 0 | 1 | Force joined subqueries and table functions to have aliases for correct name qualification |
max_insert_threads | 0 | The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see max_threads setting |
|
max_joined_block_size_rows | 65536 | Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited | |
max_parser_depth | 1000 | Maximum parser depth | |
memory_profiler_step | 0 | Every number of bytes the memory profiler will collect the allocating stack trace. The minimal effective step is 4 MiB (less values will work as clamped to 4 MiB). Zero means disabled memory profiler | |
min_bytes_to_use_mmap_io | 0 | The minimum number of bytes for reading the data with the mmap option during SELECT queries execution. 0 – disabled | |
min_chunk_bytes_for_parallel_parsing | 1048576 | The minimum chunk size in bytes, which each thread will parse in parallel | |
mutations_sync | 0 | Wait for synchronous execution of ALTER TABLE UPDATE/DELETE queries (mutations). 0 – execute asynchronously. 1 – wait current server. 2 – wait all replicas if they exist | |
optimize_if_chain_to_miltiif | 0 | Replace if(cond1, then1, if(cond2, …)) chains to multiIf | |
optimize_trivial_count_query | 1 | Process trivial SELECT count() FROM table query from metadata |
|
output_format_avro_codec | Compression codec used for output. Possible values: null , deflate , snappy |
||
output_format_avro_sync_interval | 16384 | Sync interval in bytes | |
output_format_csv_crlf_end_of_line | If it is set true, end of line in CSV format will be \r\n instead of \n | ||
output_format_tsv_crlf_end_of_line | If it is set true, end of line in TSV format will be \r\n instead of \n | ||
timeout_before_checking_execution_speed | 0 | 10 | Check that the speed is not too low after the specified time has elapsed |
use_compact_format_in_distributed_parts_names | 0 | Changes format of directories names for distributed table insert parts |
References
RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used as well but check version numbers carefully.
- https://clickhouse.tech/docs/en/whats-new/changelog/2019/#clickhouse-release-v19-17-4-11-2019-11-22
- https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v20124-2020-01-22
- https://clickhouse.com/docs/en/whats-new/changelog/2020#clickhouse-release-v20321-2020-03-12
Altinity Stable Release for ClickHouse 20.3.19.4
Released 2020-09-23
Description
The release includes several dozen bug fixes from the previous release.
Major New Features
- Server startup time has been reduced by parsing metadata in parallel.
- Improved performance of primary key analysis for LowCardinality columns. That was a performance regression introduced in 20.3.
- Improved performance of queries with large tuples and tuples used in primary key.
Bug Fixes
- Fixed a bug that prevented attaching Materialized Views to system tables.
- Fixed incorrect behavior of
if
function with NULLs. - Fixed segfaults in rare cases.
- Fixed a bug that prevented predicate pushdown for queries using
WITH
clause. - Fixed
SIGSEGV
in Kafka engine when broker is unavailable. - Fixed a bug leading to
block structure mismatch
error for queries withUNION
andJOIN
. - Fixed TTL processing logic to process all partitions in one run.
- Fixed a bug of parsing of row policies from configuration files that could result in missing policies sometimes.
- Fixed a bug with
ALTER TABLE UPDATE
could produce incorrect results when updating Nullable columns. - Fixed a bug with codecs not working properly for MergeTree compact parts.
References
Updated RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used for DEB packages but check version numbers carefully.
V19.16 Release
Point releases:
- V19.16.10.44 - 2020-01-20 - Main release notes - Read these first!
- V19.16.19.85 - 2020-04-20 - Latest release
- V19.16.12.49 - 2020-02-07
Altinity Stable Release for ClickHouse 19.16.10.44
Released 2020-01-20
Description
This release has major updates:: 2744 commits in 528 pull-requests from 113 contributors.
Major New Features
-
Tiered storage: multiple storage volumes for MergeTree tables. It’s possible to store fresh data on SSD and automatically move old data to HDD. We already discussed this functionality in our blog.
-
LIVE VIEW tables that we also described in our blog. The support in this release is not complete, but new improvements are coming quickly. Feel free to try it out.
-
WITH FILL
modifier ofORDER BY
for data gap filling. It allows to fillmissing data
to provide uniform reports. For example, you can fill missing dates for time series so that every day is shown even if some days have no data. This feature is not documented yet, so here is an example how missing dates can be filled:SELECT arrayJoin([today()-10, today()]) AS d ORDER BY d ASC WITH FILL
-
Table constraints that are checked at insert.
-
Sensitive data masking for query_log, server logs, process list with regexp-based rules.
-
Table function input() for reading incoming data in
INSERT SELECT
query. A very useful feature when you need to preprocess data just before inserting. We are going to showcase it in our planned articles around log processing use cases. -
Cascaded materialized views. It is an important feature for many use cases including Kafka integration. In particular, it allows you to have load raw data from Kafka using MV and then aggregate it using another MV. That was not possible in previous ClickHouse releases.
-
Kafka: Altinity took over support for the Kafka engine a few months ago. Kafka functionality and stability has been improved in this release, in particular:
- ClickHouse can act as Kafka producer, and not just to read from Kafka, but also send data back with an insert statement.
- Atomic parsing of each message:
kafka_row_delimiter
is now obsolete (ignored) - More reliable commit logic
- Virtual columns
_partition
and_timestamp
for Kafka engine table. - Parsing of most of the formats is working properly now
Known issues
- In case of topic rebalancing on the Kafka side, duplicates are possible. We are going to address it in 2020/Q1. https://github.com/ClickHouse/ClickHouse/issues/7259.
- Select from mysql() table function is broken in this release. Two workarounds are possible:
- Use MySQL table engine.
create table as mysql(...)
and then select from this table — this is actually a shortcut to MySQL table engine.
- Some constant-like functions can misbehave when queried through Distributed tables: https://github.com/ClickHouse/ClickHouse/issues/7666.
- ‘system reload dictionary’ does not reload dictionaries with ‘update_field’: https://github.com/ClickHouse/ClickHouse/issues/7440.
- complicated
where
conditions involving fields of UInt8 type with values > 1 can return unexpected results. (workaround: instead of a and b and c use a <> 0 and b <> 0 and c <> 0) https://github.com/ClickHouse/ClickHouse/issues/7772. - If you want to preserve old ANY JOIN behavior while upgrading from a version before 19.13, you may need to install 19.13.7 first, change
any_join_distinct_right_table_keys
setting there and after that you can upgrade to 19.16. But we recommend to review your queries and rewrite them without this join type. In future releases it will be available, but with a different name (SEMI JOIN
).
Upgrade Notes
Backward Incompatible Changes
count()
supports only a single argument.- Legacy
asterisk_left_columns_only
setting has been removed (it was disabled by default). - Numeric values for Enums can now be used directly in
IN
section of the query. - Changed serialization format of
bitmap*
aggregate function states to improve performance. Serialized states ofbitmap*
from previous versions cannot be read. If you happen to use bitmap aggregate functions, please contact us before upgrading. system.query_log
column type was changed fromUInt8
toEnum8
.ANY RIGHT/FULL/INNER JOIN
is disabled by default. Setany_join_distinct_right_table_keys
setting to enable them.
Changes
New Formats
- ORC format.
- Template/TemplateIgnoreSpaces format. It allows to parse / generate data in custom text formats. So you can for example generate HTML directly from ClickHouse thus turning ClickHouse to the web server.
- CustomSeparated/CustomSeparatedIgnoreSpaces format. Supports custom escaping and delimiter rules.
- JSONEachRowWithProgress
- Parse unquoted NULL literal as NULL (enabled by setting
format_csv_unquoted_null_literal_as_null
). - Initialize null fields with default values if the data type of this field is not nullable (enabled by setting
input_format_null_as_default
).
New or improved functions
-
Aggregate function combinators which fill null or default value when there is nothing to aggregate: -OrDefault, -OrNull
-
Introduce
uniqCombined64()
to get sane results for cardinality > UINT_MAX -
QuantileExactExclusive and Inclusive aggregate functions
-
hasToken/hasTokenCaseInsensitive (look for the token in string the same way as token_bf secondary index)
-
multiFuzzyMatchAllIndices, multiMatchAllIndices (return the Array of all matched indices in multiMatch family functions)
-
repeat function for strings
-
sigmoid and tanh functions for ML applications
-
Roaring Bitmaps:
- Changes CRoaring serialization functions (you will not be able to read Bitmaps created by earlier versions)
- bitmapSubsetInRange,
- bitmapMin, bitmapMax,
- bitmapSubsetLimit(bitmap, range_start, limit),
- groupBitmapAnd, groupBitmapOr, groupBitmapXor
-
geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision) which creates an array of precision-long strings of geohash boxes covering the provided area.
-
Support for wildcards in paths of table functions file and hdfs. If the path contains wildcards, the table will be readonly:
SELECT * FROM hdfs('hdfs://hdfs1:9000/some_dir/another_dir/*/file', 'Parquet', 'col1 String')
-
New function neighbour(value, offset[, default_value]). Allows to reach prev/next row within the column.
-
Optimize queries with ORDER BY expressions clause, where expressions have coinciding prefix with sorting key in MergeTree tables. This optimization is controlled by
optimize_read_in_order
‘setting -
New function arraySplit and arrayReverseSplit which can split an array by “cut off” conditions. They are useful in time sequence handling.
-
Table function values (the name is case-insensitive). It allows to create table with some data inline.
SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))
-
fullHostName (alias FQDN)
-
numbers_mt() — multithreaded version of numbers().
-
currentUser() (and alias user()), returning login of authorized user.
-
S3 engine and table function. Partial support in this release (no authentication), complete version is expected in 19.18.x and later
-
WITH TIES modifier for LIMIT
New dictionaries features
- Redis as an external dictionary source.
- <sparse_hashed> dictionary layout, that is functionally equivalent to the <hashed> layout, but is more memory efficient. It uses about twice as less memory at the cost of slower value retrieval.
allow_dictionaries
user setting that works similar toallow_databases
.- HTTP source new attributes: credentials and http-headers.
Operations / Monitoring
-
system.metric_log table which stores values of system.events and system.metrics with specified time interval.
-
system.text_log in order to store ClickHouse logs to itself
-
Support for detached parts removal:
ALTER TABLE <table_name> DROP DETACHED PART '<part_id>'
-
MergeTree now has an additional option
ttl_only_drop_parts
(disabled by default) to avoid partial pruning of parts, so that they dropped completely when all the rows in a part are expired. -
Added miscellaneous function getMacro(name) that returns String with the value of corresponding <macros> from configuration file on current server where the function is executed.
New metrics & events
- system.metrics: DistributedFilesToInsert, GlobalThread, GlobalThreadActive, LocalThread, LocalThreadActive
- system.events: Merge
New system tables
- system.disks
- system.storage_policies
New columns in system tables
- system.clusters: errors_count, estimated_recovery_time
- system.collations: language
- system.parts: disk_name
- system.parts_columns: disk_name
- system.processes: os_thread_ids
- system.query_log: os_thread_ids
- system.tables: storage_policy
system.merge_tree_settings changed/added
Name | Default | Description |
---|---|---|
concurrent_part_removal_threshold | 100 | Activate concurrent part removal (see ‘max_part_removal_threads’) only if the number of inactive data parts is at least this. |
max_part_loading_threads | auto(6) | The number of threads to load data parts at startup. |
max_part_removal_threads | auto(6) | The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16). |
storage_policy | default | Name of storage disk policy |
ttl_only_drop_parts | 0 | Only drop altogether the expired parts and not partially prune them. |
system.settings changed/added
Name | Default | Description |
---|---|---|
allow_drop_detached | 0 | Allow ALTER TABLE … DROP DETACHED PARTITION … queries |
allow_experimental_live_view | 0 | Enable LIVE VIEW. Not mature enough. |
allow_introspection_functions | 0 | Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations. |
any_join_distinct_right_table_keys | 0 (was 1) | Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for ’t1 ANY LEFT JOIN t2’ and ’t2 ANY RIGHT JOIN t1’. ANY RIGHT JOIN needs one-to-many keys mapping to be consistent with LEFT one. |
asterisk_left_columns_only | removed | |
connection_pool_max_wait_ms | 0 | The wait time when connection pool is full. |
default_max_bytes_in_join | 100000000 | Maximum size of right-side table if limit’s required but max_bytes_in_join is not set. |
distributed_directory_monitor_max_sleep_time_ms | 30000 | Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too. |
distributed_replica_error_cap | 1000 | Max number of errors per replica, prevents piling up incredible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time. |
distributed_replica_error_half_life | 60 | Time period reduces replica error counter by 2 times. |
format_custom_escaping_rule | Escaped | Field escaping rule (for CustomSeparated format) |
format_custom_field_delimiter | Delimiter between fields (for CustomSeparated format) | |
format_custom_result_after_delimiter | Suffix after result set (for CustomSeparated format) | |
format_custom_result_before_delimiter | Prefix before result set (for CustomSeparated format) | |
format_custom_row_after_delimiter | Delimiter after field of the last column (for CustomSeparated format) | |
format_template_resultset | Path to file which contains format string for result set (for Template format) | |
format_template_row | Path to file which contains format string for rows (for Template format) | |
format_template_rows_between_delimiter | Delimiter between rows (for Template format) | |
input_format_csv_unquoted_null_literal_as_null | 0 | Consider unquoted NULL literal as N |
input_format_null_as_default | 0 | For text input formats initialize null fields with default values if data type of this field is not nullable |
input_format_tsv_empty_as_default | 0 | Treat empty fields in TSV input as default values. |
input_format_values_accurate_types_of_literals | 1 | For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues. |
input_format_values_deduce_templates_of_expressions | 0 | For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows. |
joined_subquery_requires_alias | 0 | Force joined subqueries to have aliases for correct name qualification. |
kafka_max_wait_ms | 5000 | The wait time for reading from Kafka before retry. |
live_view_heartbeat_interval | 15 | The heartbeat interval in seconds to indicate live query is alive. |
max_http_get_redirects | 0 | Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services. |
max_live_view_insert_blocks_before_refresh | 64 | Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed. |
min_free_disk_space_for_temporary_data | 0 | The minimum disk space to keep while writing temporary data used in external sorting and aggregation. |
optimize_read_in_order | 1 | Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables. |
partial_merge_join | 0 | Use partial merge join instead of hash join for LEFT and INNER JOINs. |
partial_merge_join_optimizations | 0 | Enable optimizations in partial merge join |
partial_merge_join_rows_in_left_blocks | 10000 | Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage. |
partial_merge_join_rows_in_right_blocks | 10000 | Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk. |
query_profiler_cpu_time_period_ns | 1000000000 (was 0) | Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling. |
query_profiler_real_time_period_ns | 1000000000 (was 0) | Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling. |
replace_running_query_max_wait_ms | 5000 | The wait time for running query with the same query_id to finish when setting ‘replace_running_query’ is active. |
s3_min_upload_part_size | 536870912 | The minimum size of part to upload during multipart upload to S3. |
temporary_live_view_timeout | 5 | Timeout after which temporary live view is deleted. |
Altinity Stable Release for ClickHouse 19.16.19.85
Released 2020-04-20
Description
This is a combined release of 19.16.19.65, upgraded to 19.16.19.85 after review.
Major New Features
enable_scalar_subquery_optimization
setting is disabled by default. It is required in order to perform rolling upgrades to 20.x versions.
Bug Fixes
- Fixed a bug with non-Date/DateTime columns not being allowed in TTL expressions
- Fixed a bug in the Kafka engine that sometimes prevented committing a message back to the broker if the broker was temporarily unavailable
- Fixed a bug in which the boolean functions ‘or’ and ‘and’ might return incorrect results when called on more than 10 arguments if some of them are NULL
- Fixed server side certificates in docker image
- Fixed a possible race condition between insert quorum and drop partition
- Volume selection in merge process could move data back to the first volume in storage policy in some cases
- Mutations that did not work if
insert_quorum
has been used for the table - Deduplication logic that ignored dependent materialized views if a duplicate for the main table has been detected. The fix is turned off by default and can be enabled with a new
deduplicate_blocks_in_dependent_materialized_views
setting. - Kafka engine could result in data losses when ZooKeeper is temporarily not available
- Kafka engine did not allow users to drop a table with incorrect Kafka topic
- Kafka engine did not allow users to use subqueries in attached materialized views
ALTER TABLE MODIFY CODEC
default expression and a codec are specified
Altinity Stable Release for ClickHouse 19.16.12.49
Released 2020-02-07
Description
Improved Kafka engine reliability, performance and usability.
Bug Fixes
- When there is a rebalance of partitions between consumers, occasional data duplicates and losses were possible.
- When data was polled from several partitions with one poll and committed partially, occasional data losses were possible.
- Block size threshold (‘kafka_max_block_size’ setting) now triggers block flush correctly that reduces the chance of data loss under high load.
- When new columns are added to the Kafka pipeline (Kafka engine -> Materialized View -> Table) it was previously not possible to alter the destination tables first with default value and then alter MV in order to minimize downtime. The problem was not related directly to Kafka, but general implementation of materialized views. It is fixed now.
- Few other minor problems have been addressed as well.
Since ClickHouse now respects the ‘kafka_max_block_size’ setting that defaults to 65535, we recommend increasing it to the bigger values for high volume streaming. Setting it to 524288 or 1048576 may increase consumer throughput up to 20%.
Known Issues
The ‘kafka_max_block_size’ defaults to 65535. We recommend increasing it to the bigger values for high volume streaming. Setting it to 524288 or 1048576 may increase consumer throughput up to 20%.
V19.13 Release
Altinity Stable Release for ClickHouse 19.13.7.57
Released 2019-11-28
Description
This is a minor upgrade over the previous release.
Major New Features
CREATE TABLE AS TABLE FUNCTION
statement that allows to create tables from external data automatically, e.g. from mysql table.COLUMN('regexp')
macro function as a generalization ofSELECT *
queries.
Bug Fixes
- Fixed security vulnerability in
url()
function. - Fixed security vulnerability related to Zookeeper.
- Fixed vulnerabilities in compression codecs.
- Fixed corner case overflows in DoubleDelta codec.
Known Issues
- ANY INNER|RIGHT|FULL JOIN is deprecated! While we like how it worked before, there are plans to change the behavior completely. We are still discussing with the development team how we can preserve the old behavior. For now, in order to preserve the old behavior and safely upgrade to newer versions in the future one needs to enable
any_join_distinct_right_table_keys
setting! - The setting
input_format_defaults_for_omitted_fields
is on by default. It enables calculation of complex default expressions for omitted fields in JSONEachRow and CSV* formats. Inserts to Distributed tables need this setting to be the same across the cluster.- You need to set
input_format_defaults_for_omitted_fields
across the cluster before the upgrade if rolling upgrade is performed.
- You need to set
References
- Previous versions can be found at: https://packagecloud.io/Altinity/clickhouse-altinity-stable.
- multi-volume storage support.
V19.11 Release
Altinity Stable Release for ClickHouse 19.11.8
Released 2019-09-03
Description
Updates including improved Kafka support.
Major New Features
- Multi-table JOINs with standard SQL syntax (some limitations are left: only equi-JOIN, limited condition pushdown support)
- Secondary indices - allow skipping data blocks without reading data: min/max filter, n-grams or token-based bloom filter
- TTL expressions for columns and tables to remove old data automatically
- Unique ASOF JOIN - join to the most recent value known
- Per-column custom compression codecs (Delta, T64, DoubleDelta, Gorilla): fine-tune compression for particular columns
- Roaring Bitmaps - use bitmaps for different on-set operations (calculate cardinality, unions, intersects, etc)
- Machine-learning functions: evalMLMethod, simpleLinearRegression, stochasticLinearRegression, stochasticLogisticRegression
- Predefined per-row filter expressions for tables, which enable simple row-level security rules
- Adaptive index granularity (setting
index_granularity_bytes
) - useful for tables having rows with relatively big size (>1280 bytes) - Advanced text processing functions
- Multiple substring search: multiSearch(Any|FirstPosition|FirstIndex|AllPositions)(CaseInsensitive)?(UTF8)?
- Multiple regex search (hyperscan-powered): multiMatch(Any|AnyIndex)
- Fuzzy regex match (hyperscan-powered): multiFuzzyMatch(Any|AnyIndex)
- N-gram distance for fuzzy string comparison and search (similar to q-gram metrics in R language): ngramDistance(CaseInsensitive)?(UTF8)?, ngramSearch(CaseInsensitive)?(UTF8)?
- HDFS read/write access
- New JSON processing functions - high performance & compliant with JSON standard
- IPv4 and IPv6 data types
- New formats
- Protobuf — now fully supported with input and output plus nested data structures
- Parquet
- RowBinaryWithNamesAndTypes
- JSONEachRow and TSKV - now support default expressions for missing fields (Check
input_format_defaults_for_omitted_fields
) - TSVWithNames/CSVWithNames - column order can now be determined from file header (Check
input_format_with_names_use_header
parameter).
- SQL statements with bind parameters
- Improved MySQL integration:
- new database engine to access all the tables in remote MySQL server
- support for MySQL wire protocol, allowing to connect to ClickHouse using MySQL clients.
Known issues
-
All released 19.x versions have had some problems with Kafka engine implementation due to a full re-write of Kafka support. In 19.11.8 Kafka is working much better than previous 19.x releases. However, there are still some corner cases that can lead to data duplication in certain scenarios, for example, in the event of ClickHouse server restart. Those issues will be addressed soon.
-
Adaptive granularity is enabled by default, and index_granularity_bytes is set to 10Mb. This feature uses a different data format, and interoperability between old and new format has some issues. So if you’re upgrading your cluster from an older version, consider disabling it before the upgrade by putting the following fragment in your config.xml:
<merge_tree> <index_granularity_bytes>0</index_granularity_bytes> </merge_tree></code></pre>
After upgrade, you can choose any convenient time to turn adaptive granularity on. In general, it’s a cool feature and especially useful when you have rows of size > 1Kb in your tables. If you are doing a new installation, please leave the default setting value as is. The adaptive granularity feature is very nice and useful.
-
enable_optimize_predicate_expression
is now enabled by default. It`s possible you may have some issues when a condition passed to subselect leads to some suboptimal / undesired effects. If this happens please report the issue and disable the feature for that select (or globally). -
Secondary indices are maturing but still considered as experimental. There is at least one severe bug: when a mutation is executed with a condition on the column with secondary index - it can affect more rows than expected. Please be careful with DELETE, or upgrade to 19.13 (see issue #6224).
-
Some users have reported problems with ODBC data sources after upgrade. In most cases these were misconfigurations. Nevertheless, please do canary/staging updates and check how your ODBC connections work before moving to production.
Upgrade Notes
Backward compatibility issues
- Due to update of LZ4 library the new ClickHouse version writes parts which are not binary equivalent to those written with older versions. That makes it problematic to update only one replica. Leaving the cluster in such a state for a long period of time time will work but may lead to excessive parts copying between nodes due to checksum mismatches.
- There is a new setting
max_partitions_per_insert_block
with default value 100. If the inserted block contains a larger number of partitions, an exception is thrown. Set it to 0 if you want to remove the limit (not recommended). - If you are using unexpected low cardinality combinations like LowCardinality(UInt8), the new version will prevent you from doing so. if you really know what you are doing check
allow_suspicious_low_cardinality_types
and set it to 1. - This release adds
max_parts_in_total
setting for MergeTree family of tables (default: 100 000). We hope your number of partitions is much lower than this limit. If necessary you can raise the value. - The
system.dictionaries
table has been changed. If you used it for monitoring purposes, you may need to change your scripts. - Dictionaries are loaded lazily by default. It means they have status
NOT_LOADED
until the first access.
Changes
New functions
- New hash functions: xxHash32, xxHash64, gccMurmurHash, hiveHash, javaHash, CRC32
- JSON processing functions: JSONExtract,JSONExtract(Bool|Float|Int|KeysAndValues|Raw|String|UInt),JSONHas,JSONKey,JSONLength,JSONType
- Geospatial processing: geoToH3, geohashEncode, geohashDecode
- IP address handling: IPv4CIDRtoIPv4Range, IPv6CIDRtoIPv6Range, toIPv4, toIPv6
- New statistical aggregate functions: skewPop, skewSamp, kurtPop, kurtSamp
- String functions: isValidUTF8, regexpQuoteMeta, trimBoth, trimLeft, trimRight, format, toValidUTF8
- Encoding: tryBase64Decode, base64Decode, base64Encode
- Array processing: arrayEnumerateDenseRanked, arrayEnumerateUniqRanked, flatten, arrayFlatten, arrayWithConstant,
- Date/Time processing: toStartOfInterval, addQuarters, subtractQuarters, toIntervalQuarter, toStartOfTenMinutes
- Numerical: roundDown, toDecimalOrZero, toDecimalOrNull
- Dictionary: dictGet, dictGetOrDefault
- Roaring Bitmaps: bitmapAndCardinality, bitmapAndnot, bitmapAndnotCardinality, bitmapBuild, bitmapCardinality, bitmapContains, bitmapHasAll, bitmapHasAny, bitmapOr, bitmapOrCardinality, bitmapToArray, bitmapXor, bitmapXorCardinality, groupBitmap
- OS level introspection: filesystemAvailable, filesystemCapacity, filesystemFree, basename
- New aggregate functions: boundingRatio, entropy, groupBitmap, sumMap(Filtered|FilteredWithOverflow|WithOverflow), topKWeighted, groupArrayMovingAvg,groupArrayMovingSum, timeSeriesGroupRateSum, timeSeriesGroupSum
- SQL standard compatibility aliases added: left, right, trim, ltrim, rtrim, timestampadd, timestampsub, power, replace, ln, locate, mid
Other notable changes
- Setting constraints which limit the possible range of setting value per user profile.
- KILL MUTATION added
- New aggregate function combinators: -Resample
- Added new data type SimpleAggregateFunction - light aggregation for simple functions like any, anyLast, sum, min, max
- Ability to use different sorting key (ORDER BY) and index (PRIMARY KEY). The sorting key can be longer than the index. You can alter ORDER BY at the moment of adding / removing the column
- HTTP interface: brotli compression support, X-ClickHouse-Query-Id and X-ClickHouse-Summary headers in response, ability to cancel query on disconnect (check
cancel_http_readonly_queries_on_client_close
) - DFA-based implementation for functions sequenceMatch and sequenceCount in case the pattern doesn’t contain time
- Back up all partitions at once with ALTER TABLE … FREEZE
- Comments for a column in the table description
- Join engine: new options
join_use_nulls
,max_rows_in_join
,max_bytes_in_join
,join_any_take_last_row
andjoin_overflow_mode
+ joinGet function that allows you to use a Join type table like a dictionary. /docker-entrypoint-initdb.d
for database initialization in docker- Graphite rollup rules reworked.
- Query settings in asynchronous INSERTs into Distributed tables are respected now
- Hints when while user make a typo in function name or type in command line client
- system.detached_parts table containing information about detached parts of MergeTree tables
- Table function remoteSecure
- Ability to write zookeeper part data in more compact form (
use_minimalistic_part_header_in_zookeeper
) - Ability to close MySQL connections after their usage in external dictionaries
- Support RENAME operation for Materialized View
- Non-blocking loading of external dictionaries
Kafka now supports SASL SCRAM authentication, new virtual columns _topic, _offset, _key are available, and a lot of other improvements.
Settings changed/added
Name | Default | Description |
---|---|---|
allow_experimental_cross_to_join_conversion | 1 | Convert CROSS JOIN to INNER JOIN if possible |
allow_experimental_data_skipping_indices | 0 | If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries. |
allow_experimental_low_cardinality_type | 1 (was 0) | Obsolete setting, does nothing. Will be removed after 2019-08-13 |
allow_experimental_multiple_joins_emulation | 1 | Emulate multiple joins using subselects |
allow_hyperscan | 1 | Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage. |
allow_simdjson | 1 | Allow using simdjson library in ‘JSON*’ functions if AVX2 instructions are available. If disabled rapidjson will be used. |
allow_suspicious_low_cardinality_types | 0 | In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption. |
cancel_http_readonly_queries_on_client_close | 0 | Cancel HTTP readonly queries when a client closes the connection without waiting for response. |
check_query_single_value_result | 1 | Return check query result as single 1/0 value |
enable_conditional_computation | (was 0) | DELETED |
enable_optimize_predicate_expression | 1 (was 0) | If it is set to true, optimize predicates to subqueries. |
enable_unaligned_array_join | 0 | Allow ARRAY JOIN with multiple arrays that have different sizes. When this setting is enabled, arrays will be resized to the longest one. |
external_table_functions_use_nulls | 1 | If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only for ‘mysql’ table function. |
idle_connection_timeout | 3600 | Close idle TCP connections after specified number of seconds. |
input_format_defaults_for_omitted_fields | 0 | For input data calculate default expressions for omitted fields (it works for JSONEachRow format). |
input_format_with_names_use_header | 0 | For TSVWithNames and CSVWithNames input formats this controls whether format parser is to assume that column data appear in the input exactly as they are specified in the header. |
join_any_take_last_row | 0 | When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key. Allows you to overwrite old values in table with Engine=Join. |
join_default_strictness | ALL | Set default strictness in JOIN query. Possible values: empty string, ‘ANY’, ‘ALL’. If empty, query without strictness will throw exception. |
low_cardinality_allow_in_native_format | 1 | Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query. |
max_alter_threads | auto | The maximum number of threads to execute the ALTER requests. By default, it is determined automatically. |
max_execution_speed | 0 | Maximum number of execution rows per second. |
max_execution_speed_bytes | 0 | Maximum number of execution bytes per second. |
max_partitions_per_insert_block | 100 | Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception. |
max_streams_multiplier_for_merge_tables | 5 | Request more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and is especially helpful when merged tables differ in size. |
max_threads | auto | The maximum number of threads to execute the request. By default, it is determined automatically. |
merge_tree_max_bytes_to_use_cache | 2013265920 | The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.) |
merge_tree_min_bytes_for_concurrent_read | 251658240 | If at least as many bytes are read from one file, the reading can be parallelized. |
merge_tree_min_bytes_for_seek | 0 | You can skip reading more than that number of bytes at the price of one seek per file. |
min_count_to_compile_expression | 3 | The number of identical expressions before they are JIT-compiled |
min_execution_speed_bytes | 0 | Minimum number of execution bytes per second. |
network_compression_method | LZ4 | Allows you to select the method of data compression when writing. |
optimize_skip_unused_shards | 0 | Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key. |
os_thread_priority | 0 | If non zero - set corresponding ‘nice’ value for query processing threads. Can be used to adjust query priority for OS scheduler. |
output_format_parquet_row_group_size | 1000000 | Row group size in rows. |
stream_poll_timeout_ms | 500 | Timeout for polling data from streaming storages. |
tcp_keep_alive_timeout | 0 | The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes |
MergeTree settings
Name | Default | Description |
---|---|---|
enable_mixed_granularity_parts | 0 | Enable parts with adaptive and non adaptive granularity |
index_granularity_bytes | 10485760 | Approximate amount of bytes in single granule (0 - disabled). |
write_final_mark | 1 | Write final mark after end of column (0 - disabled, do nothing if index_granularity_bytes=0) |
max_parts_in_total | 100000 | If more than this number active parts in all partitions in total, throw ‘Too many parts …’ exception. |
merge_with_ttl_timeout | 86400 | Minimal time in seconds, when merge with TTL can be repeated. |
min_merge_bytes_to_use_direct_io | 10737418240 (was 0) | Minimal amount of bytes to enable O_DIRECT in merge (0 - disabled). |
replicated_max_parallel_fetches_for_host | 15 | Limit parallel fetches from endpoint (actually pool size). |
use_minimalistic_part_header_in_zookeeper | 0 | Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes (/columns and /checksums). This can dramatically reduce snapshot size in ZooKeeper. Before enabling check that all replicas support new format. |
V18.14 Release
Altinity Stable Release for ClickHouse 18.14.19
Released 2018-12-31
Description
Altinity Stable Release 18.14.19 is a minor bug-fixing release to the previous 18.14.15. ClickHouse memory consumption in general is decreased with this release.
Bug fixes
- Fixed a bug with range_hashed dictionaries returning wrong results.
- Fixed an error that caused messages “netlink: ‘…’: attribute type 1 has an invalid length” to be printed in Linux kernel log in some recent versions of Linux kernel.
- Fixed segfault in function ’empty’ for FixedString arguments.
- Fixed excessive memory allocation when using a large value of max_query_size.
- Fixed cases when the ODBC bridge process did not terminate with the main server process.
- Fixed synchronous insertion into the Distributed table with a columns list that differs from the column list of the remote table.
- Fixed a rare race condition that could lead to a crash when dropping a MergeTree table.
- Fixed a query deadlock in a case when query thread creation fails with the ‘Resource temporarily unavailable’ error.
- Fixed parsing of the ENGINE clause when the CREATE AS table syntax was used and the ENGINE clause was specified before the AS table (the error resulted in ignoring the specified engine).
- Fixed a segfault if the ‘max_temporary_non_const_columns’ limit was exceeded
- Fixed a bug with databases being not correctly specified when executing DDL ON CLUSTER queries
References
- 18.14.19 release can be found at Yandex repo for Debian packages and Altinity repo for rpms.