Altinity Stable® Builds and Releases for ClickHouse® V21 and older

Here are the release notes from V21 and older. You can choose any of the following links to go directly to a particular release:

V21.8 Build

Point releases:

Major new features since the previous stable release 21.3

Released 2021-10-11

A new release introduces a lot of changes and new functions. The full list is available in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:

  • SQL features:
    • DISTINCT ON a subset of columns
    • Partial support of SQL/JSON standard
    • Arrays in dictionaries are now supported
    • Arrays and nested data types are now supported for Parquet and Arrow formats
    • DateTime64 extended range, Now dates between 1925 to 2283 years are supported. a)
  • Security features:
    • Disk level encryption
    • Kerberos authentication for HTTP protocol a)
    • Active Directory groups mapping for LDAP user directory a)
  • Replication and Cluster improvements:
    • ClickHouse Keeper (experimental) – in-process ZooKeeper replacement
    • SYSTEM RESTORE REPLICA – a handy tool that makes life easier in the unfortunate occasion if ZooKeeper metadata is lost a)
    • Support for MySQL/PostgreSQL clusters when using mysql/postgresql table functions
    • Zero-copy replication for S3 tables (see Altinity blog post for details)
    • Parts movement between shards (experimental)
  • MergeTree features:
  • Integrations:
    • MaterializedPostgreSQL database engine for replication from PostgreSQL
    • HDFS disk support (experimental)
    • Allow to catch Kafka errors into a separate stream (see the KB article on this)
  • Other:
    • YAML configuration format as an alternative to XML

As usual with ClickHouse, there are many performance and operational improvements in different server components.


a) Contributed by Altinity developers.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • Users of LowCardinality(Nullable(...)) can not safely downgrade to versions older than 21.4. Data in tables with columns of those types inserted / merged by 21.8 will be lost after the downgrade.
  • Values of UUID type cannot be compared with integers. For example, instead of writing uuid != 0 type uuid != '00000000-0000-0000-0000-000000000000'
  • The toStartOfIntervalFunction will align hour intervals to midnight (in previous versions they were aligned to the start of unix epoch). For example, toStartOfInterval(x, INTERVAL 11 HOUR) will split every day into three intervals: 00:00:00..10:59:59, 11:00:00..21:59:59 and 22:00:00..23:59:59.
  • It’s not possible to rollback to the older ClickHouse version after executing ALTER ... ATTACH query as the old servers would fail to process the new command entry ATTACH_PART in the replicated log.
  • The behaviour of remote_url_allow_hosts has changed. In previous versions the empty section did nothing, in 21.8 it will block access to all external hosts. Remove this section from the configuration files after an upgrade if you experience issues with url() or s3() functions.
  • If you will downgrade to version before 21.1 clickhouse will not be able to start automatically – you will need to remove the system.*_log tables manually to downgrade
  • There is an issue with uniqueState(UUID) in AggregatingMergeTree tables, and can be corrected by replacing uniqState(uuid) in MATERIALIZED VIEWs with uniqState(sipHash64(uuid)) and change data type for already saved data from AggregateFunction(uniq, UUID) to AggregateFunction(uniq, UInt64). For more information see the following:

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • Distributed queries with explicitly defined large sets are now executed differently. Compatibility setting legacy_column_name_of_tuple_literal may be enabled during the rolling upgrade of the cluster. Otherwise distributed queries with explicitly defined sets at IN clause may fail during upgrade.
  • ATTACH PART[ITION] queries may not work during cluster upgrade

Other Important Changes

ClickHouse embedded monitoring has become a bit more aggressive. It now collects several system stats, and stores them in the table system.asynchronious_metric_log. This can be visible as an increase of background writes, storage usage, etc. To return to the old rate of metrics refresh / flush, adjust those settings in config.xml:

<asynchronous_metrics_update_period_s>
    60
</asynchronous_metrics_update_period_s>
<asynchronous_metric_log>
    <flush_interval_milliseconds>
        60000
    </flush_interval_milliseconds>
</asynchronous_metric_log>

Alternatively, metric_log and asynchronous_metric_log tables can be completely disabled:

<yandex>
    <asynchronous_metric_log remove="1"/>
    <metric_log remove="1"/>
</yandex>

Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:

  • async_socket_for_remote
  • compile_aggregate_expressions
  • compile_expressions
  • cross_to_inner_join_rewrite
  • insert_null_as_default
  • optimize_skip_unused_shards_rewrite_in
  • query_plan_enable_optimizations
  • query_plan_filter_push_down

In the previous releases we recommended disabling optimize_on_insert. This recommendation stays for 21.8 as well as inserts into Summing and AggregatingMergeTree can slow down.

Known issues in 21.8.8

The development team continues to improve the quality of the 21.8 release. The following issues still exist in the 21.8.8 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications:

  • system.events for event = 'Merge' are overstated. ClickHouse incorrectly increments this counter.
  • Timeout exceeded: elapsed 18446744073.709553 seconds error that might happen in extremely rare cases, presumably due to some bug in kernel.

You may also look into a GitHub issues using a special v21.8-affected label.​​

ClickHouse Altinity Stable Releases are based on the community versions. For more information on installing ClickHouse from either the Altinity Stable builds or the community builds, see the ClickHouse Altinity Stable Builds Install Guide.

Please contact us at info@altinity.com if you experience any issues with the upgrade.

Appendix

New functions
  • DateTime functions:
    • dateName
    • timeZone, timeZoneOf, timeZoneOffset, timezoneOf, toTimezone
  • JSON processing functions:
    • JSON_EXISTS, JSON_QUERY, JSON_VALUE
    • simpleJSONExtractBool, simpleJSONExtractFloat, simpleJSONExtractInt, simpleJSONExtractRaw, simpleJSONExtractString, simpleJSONExtractUInt, simpleJSONHas
    • toJSONString
  • String functions:
    • bin/unbin
    • leftPad, lpad, leftPadUTF8
    • rightPad, rpad, rightPadUTF8
    • splitByRegexp
  • Array functions:
    • arrayProduct
    • bitPositionsToArray
    • validateNestedArraySizes
  • Dictionary:
    • dictGetChildren
    • dictGetDescendants
    • dictGetOrNull
  • Aggregate functions
    • deltaSumTimestamp
    • intervalLengthSum
    • lagInFrame/leadInFrame
    • sequenceNextNode
    • sumCount
    • uniqTheta
  • Geo:
    • polygonAreaCartesian, polygonAreaSpherical
    • polygonConvexHullCartesian
    • polygonPerimeterCartesian, polygonPerimeterSpherical
    • polygonsDistanceCartesian, polygonsDistanceSpherical
    • polygonsEqualsCartesian
    • polygonsIntersectionCartesian, polygonsIntersectionSpherical
    • polygonsSymDifferenceCartesian, polygonsSymDifferenceSpherical
    • polygonsUnionCartesian, polygonsUnionSpherical
    • polygonsWithinCartesian, polygonsWithinSpherical
    • readWktMultiPolygon, readWktPoint, readWktPolygon, readWktRing
    • wkt
  • Math/Statistics:
    • quantileBFloat16, quantilesBFloat16
  • Type related:
    • reinterpretAsUInt128
    • toUInt128, toUInt128OrNull, toUInt128OrZero
  • Other:
    • indexHint
    • isIPAddressInRange
    • partitionId – that corresponds to _partition_id virtual column added to MergeTree tables
New table functions
  • dictionary
  • s3Cluster – see our blog article that highlights this one
New table engines
  • ExternalDistributed – allows to query MySQL or PostgreSQL clusters
  • MaterializedPostgreSQL
New metrics and events
system.asynchronous_metrics
  • AsynchronousMetricsCalculationTimeSpent

The table has also got a lot of new metrics for a host monitoring rather than ClickHouse.

system.metrics
  • BrokenDistributedFilesToInsert
  • MMappedFileBytes
  • NetworkReceive
  • NetworkSend
system.events
  • MMappedFileCacheHits
  • MMappedFileCacheMisses
  • MergeTreeDataProjectionWriterBlocks
  • MergeTreeDataProjectionWriterBlocksAlreadySorted
  • MergeTreeDataProjectionWriterCompressedBytes
  • MergeTreeDataProjectionWriterRows
  • MergeTreeDataProjectionWriterUncompressedBytes
  • NetworkReceiveBytes
  • NetworkSendBytes
  • StorageBufferLayerLockReadersWaitMilliseconds
  • StorageBufferLayerLockWritersWaitMilliseconds
  • StorageBufferPassedBytesFlushThreshold
  • StorageBufferPassedRowsFlushThreshold
  • StorageBufferPassedTimeFlushThreshold
system.errors
  • CANNOT_CREATE_FILE
  • CANNOT_PARSE_YAML
  • CANNOT_SET_ROUNDING_MODE
  • CANNOT_SYSCONF
  • CONCURRENT_ACCESS_NOT_SUPPORTED
  • DISTRIBUTED_BROKEN_BATCH_FILES
  • DISTRIBUTED_BROKEN_BATCH_INFO
  • ILLEGAL_PROJECTION
  • INCORRECT_PART_TYPE
  • INVALID_FORMAT_INSERT_QUERY_WITH_DATA
  • KERBEROS_ERROR
  • NO_SUCH_PROJECTION_IN_TABLE
  • POSTGRESQL_CONNECTION_FAILURE
  • PROJECTION_NOT_USED
  • TOO_LARGE_DISTRIBUTED_DEPTH
  • UNKNOWN_SNAPSHOT
New system tables
  • data_skipping_indices:
    • database, table, name, type, expr, granularity
  • part_moves_between_shards:
    • database, table, task_name, task_uuid, create_time, part_name, part_uuid, to_shard, update_time, state, num_tries, last_exception
  • projection_parts – same as parts but with extra ‘parent_*’ columns
  • projection_parts_columns – same as parts_columns but with extra ‘parent_*’ columns
New columns in system tables
  • clusters
    • shutdown_count
  • dictionaries:
    • key – removed
    • key.names, key.types, found_rate
  • distribution_queue:
    • broken_data_files, broken_data_compressed_bytes
  • errors:
    • last_error_time, last_error_message, last_error_trace, remote
  • metric_log:
    • multiple columns
  • parts_columns:
    • uuid, min_time, max_time
  • processes:
    • Settings (Map replaces Settings.Names/Values arrays), ProfileEvents (Map replaces ProfileEvents.Names/Values arrays), current_database
  • query_log:
    • projections, initial_query_start_time, initial_query_start_time_microseconds, Settings (Map replaces Settings.Names/Values arrays), ProfileEvents (Map replaces ProfileEvents.Names/Values arrays)
  • query_thread_log:
    • initial_query_start_time_microseconds, Settings (Map replaces Settings.Names/Values arrays), ProfileEvents (Map replaces ProfileEvents.Names/Values arrays)
  • stack_trace:
    • thread_name
  • table_engines:
    • supports_projections
  • tables:
    • comment
  • users
    • grantees_any, grantees_list, grantees_except
System.merge_tree_settings added/changed
Name Old value New value Description
max_parts_to_merge_at_once 100 Max amount of parts which can be merged at once (0 - disabled). Doesn’t affect OPTIMIZE FINAL query.
max_replicated_fetches_network_bandwidth 0 The maximum speed of data exchange over the network in bytes per second for replicated fetches. Zero means unlimited.
max_replicated_sends_network_bandwidth 0 The maximum speed of data exchange over the network in bytes per second for replicated sends. Zero means unlimited.
min_bytes_to_rebalance_partition_over_jbod 0 Minimal amount of bytes to enable part rebalance over JBOD array (0 - disabled).
non_replicated_deduplication_window 0 How many last blocks of hashes should be kept on disk (0 - disabled).
part_moves_between_shards_delay_seconds 30 Time to wait before/after moving parts between shards.
part_moves_between_shards_enable 0 Experimental/Incomplete feature to move parts between shards. Does not take into account sharding expressions.
replicated_fetches_http_connection_timeout 0 HTTP connection timeout for part fetch requests. Inherited from default profile http_connection_timeout if not set explicitly.
replicated_fetches_http_receive_timeout 0 HTTP receive timeout for fetch part requests. Inherited from default profile http_receive_timeout if not set explicitly.
replicated_fetches_http_send_timeout 0 HTTP send timeout for part fetch requests. Inherited from default profile http_send_timeout if not set explicitly.
remote_fs_execute_merges_on_single_replica_time_threshold 10800 When greater than zero only a single replica starts the merge immediately when merged part on shared storage and ‘allow_remote_fs_zero_copy_replication’ is enabled.
system.settings added/changed
Type Name Old value New value Description
settings allow_experimental_bigint_types 0 1 Obsolete setting, does nothing.
settings allow_experimental_codecs 0 If it is set to true, allow to specify experimental compression codecs (but we don't have those yet and this option does nothing).
settings allow_experimental_database_materialized_postgresql 0 Allow to create database with Engine=MaterializedPostgreSQL(…).
settings allow_experimental_funnel_functions 0 Enable experimental functions for funnel analysis.
settings allow_experimental_map_type 0 1 Obsolete setting, does nothing.
settings allow_experimental_projection_optimization 0 Enable projection optimization when processing SELECT queries
settings async_socket_for_remote 0 1 Asynchronously read from socket executing remote query
settings background_schedule_pool_size 16 128 Number of threads performing background tasks for replicated tables, dns cache updates. Only has meaning at server startup.
settings compile_aggregate_expressions 1 Compile aggregate functions to native code.
settings compile_expressions 0 1 Compile some scalar functions and operators to native code.
settings cross_to_inner_join_rewrite 1 Use inner join instead of comma/cross join if possible
settings database_replicated_always_detach_permanently 0 Execute DETACH TABLE as DETACH TABLE PERMANENTLY if database engine is Replicated
settings distributed_ddl_entry_format_version 1 Version of DDL entry to write into ZooKeeper
settings distributed_ddl_output_mode throw Format of distributed DDL query result
settings distributed_directory_monitor_split_batch_on_failure 0 Should StorageDistributed DirectoryMonitors try to split batch into smaller in case of failures.
settings distributed_push_down_limit 0 If 1, LIMIT will be applied on each shard separatelly. Usually you don't need to use it, since this will be done automatically if it is possible, i.e. for simple query SELECT FROM LIMIT.
settings experimental_query_deduplication_send_all_part_uuids 0 If false only part UUIDs for currently moving parts are sent. If true all read part UUIDs are sent (useful only for testing).
settings external_storage_max_read_bytes 0 Limit maximum number of bytes when table with external engine should flush history data. Now supported only for MySQL table engine, database engine, dictionary and MaterializeMySQL. If equal to 0, this setting is disabled
settings external_storage_max_read_rows 0 Limit maximum number of rows when table with external engine should flush history data. Now supported only for MySQL table engine, database engine, dictionary and MaterializeMySQL. If equal to 0, this setting is disabled
settings force_optimize_projection 0 If projection optimization is enabled, SELECT queries need to use projection
settings glob_expansion_max_elements 1000 Maximum number of allowed addresses (For external storages, table functions, etc).
settings group_by_two_level_threshold_bytes 100000000 50000000 From what size of the aggregation state in bytes, a two-level aggregation begins to be used. 0 - the threshold is not set. Two-level aggregation is used when at least one of the thresholds is triggered.
settings handle_kafka_error_mode default Obsolete setting, does nothing.
settings http_max_field_name_size 1048576 Maximum length of field name in HTTP header
settings http_max_field_value_size 1048576 Maximum length of field value in HTTP header
settings http_max_fields 1000000 Maximum number of fields in HTTP header
settings http_max_uri_size 1048576 Maximum URI length of HTTP request
settings insert_null_as_default 1 Insert DEFAULT values instead of NULL in INSERT SELECT (UNION ALL)
settings legacy_column_name_of_tuple_literal 0 List all names of element of large tuple literals in their column names instead of hash. This settings exists only for compatibility reasons. It makes sense to set to 'true', while doing rolling update of cluster from version lower than 21.7 to higher.
settings max_distributed_depth 5 Maximum distributed query depth
settings max_replicated_fetches_network_bandwidth_for_server 0 The maximum speed of data exchange over the network in bytes per second for replicated fetches. Zero means unlimited. Only has meaning at server startup.
settings max_replicated_sends_network_bandwidth_for_server 0 The maximum speed of data exchange over the network in bytes per second for replicated sends. Zero means unlimited. Only has meaning at server startup.
settings min_count_to_compile_aggregate_expression 3 The number of identical aggregate expressions before they are JIT-compiled
settings normalize_function_names 1 0 Normalize function names to their canonical names
settings odbc_bridge_connection_pool_size 16 Connection pool size for each connection settings string in ODBC bridge.
settings optimize_functions_to_subcolumns 0 Transform functions to subcolumns, if possible, to reduce amount of read data. E.g. 'length(arr)' -> 'arr.size0', 'col IS NULL' -> 'col.null'
settings optimize_fuse_sum_count_avg 0 Fuse aggregate functions sum(), avg(), count() with identical arguments into one sumCount() call, if the query has at least two different functions
settings optimize_move_to_prewhere_if_final 0 If query has FINAL, the optimization move_to_prewhere is not always correct and it is enabled only if both settings optimize_move_to_prewhere and optimize_move_to_prewhere_if_final are turned on
settings optimize_skip_unused_shards_limit 1000 Limit for number of sharding key values, turns off optimize_skip_unused_shards if the limit is reached
settings optimize_skip_unused_shards_rewrite_in 1 Rewrite IN in query for remote shards to exclude values that does not belong to the shard (requires optimize_skip_unused_shards)
settings output_format_arrow_low_cardinality_as_dictionary 0 Enable output LowCardinality type as Dictionary Arrow type
settings postgresql_connection_pool_size 16 Connection pool size for PostgreSQL table engine and database engine.
settings postgresql_connection_pool_wait_timeout 5000 Connection pool push/pop timeout on empty pool for PostgreSQL table engine and database engine. By default it will block on empty pool.
settings prefer_column_name_to_alias 0 Prefer using column names instead of aliases if possible.
settings prefer_global_in_and_join 0 If enabled, all IN/JOIN operators will be rewritten as GLOBAL IN/JOIN. It's useful when the to-be-joined tables are only available on the initiator and we need to always scatter their data on-the-fly during distributed processing with the GLOBAL keyword. It's also useful to reduce the need to access the external sources joining external tables.
settings query_plan_enable_optimizations 1 Apply optimizations to query plan
settings query_plan_filter_push_down 1 Allow to push down filter by predicate query plan step
settings s3_max_single_read_retries 4 The maximum number of retries during single S3 read.
settings sleep_in_send_data 0
settings sleep_in_send_data_ms 0 Time to sleep in sending data in TCPHandler
settings sleep_in_send_tables_status 0
settings sleep_in_send_tables_status_ms 0 Time to sleep in sending tables status response in TCPHandler
settings use_antlr_parser 0

Also, please refer to the release notes from the development team available at the following URLs:

21.8.15

Released 2022-04-15

Release Notes

Based on upstream/v21.8.15.7-lts.

ClickHouse release v21.8.15.7-altinitystable as compared to community ClickHouse v21.8.15.7:

Changes compared to the Community Build

Bug Fixes
  • Bug Fix: Fixed issue with merging configs that have and root XML tags. (backport of ClickHouse@213ecae) (via #114).
  • Bug Fix: Backport of ClickHouse#31823 to 21.8: Fix invalid cast of nullable type when nullable primary key is used (via #110)
Features
  • Build/Testing/Packaging Improvement: Various macOS compilation and packaging fixes/backporting (via #108)
  • Build/Testing/Packaging Improvement: Minor test improvements (via afb4a0a, 4c35386)

Changes compared to Altinity Stable 21.8.13.1

Bug Fixes

Test Results

Packages

21.8.13

Released 2022-01-05

Release Notes

ClickHouse release v21.8.13.1-altinitystable as compared to v21.8.12.29-altinitystable:

Bug Fixes
  • Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee).

  • Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31656 (sunny).

  • NO CL ENTRY: ‘fix json error after downgrade’. #33166 (bullet1337).

  • Integer overflow to resize the arrays causes heap corrupt. #33024 (varadarajkumar).

  • fix crash when used fuzzBits with multiply same FixedString, Close #32737. #32755 (SuperDJY).

  • Fix possible exception at RabbitMQ storage startup by delaying channel creation. #32584 (Kseniia Sumarokova).

  • Fixed crash with SIGFPE in aggregate function avgWeighted with Decimal argument. Fixes #32053. #32303 (tavplubix).

  • Some replication queue entries might hang for temporary_directories_lifetime (1 day by default) with Directory tmp_merge_<part_name> or Part ... (state Deleting) already exists, but it will be deleted soon or similar error. It’s fixed. Fixes #29616. #32201 (tavplubix).

  • XML dictionaries identifiers, used in table create query, can be qualified to default_database during upgrade to newer version. Closes #31963. #32187 (Maksim Kita).

  • Number of active replicas might be determined incorrectly when inserting with quorum if setting replicated_can_become_leader is disabled on some replicas. It’s fixed. #32157 (tavplubix).

  • Fixed Directory ... already exists and is not empty error when detaching part. #32063 (tavplubix).

  • Some GET_PART entry might hang in replication queue if part is lost on all replicas and there are no other parts in the same partition. It’s fixed in cases when partition key contains only columns of integer types or Date[Time]. Fixes #31485. #31887 (tavplubix).

  • Change configuration path from keeper_server.session_timeout_ms to keeper_server.coordination_settings.session_timeout_ms when constructing a KeeperTCPHandler - Same with operation_timeout. #31859 (JackyWoo).

  • Fix a bug about function transform with decimal args. #31839 (李帅).

  • Fix crash when function dictGet with type is used for dictionary attribute when type is Nullable. Fixes #30980. #31800 (Maksim Kita).

  • Fix crash with empty result on odbc query. Closes #31465. #31766 (Kseniia Sumarokova).

  • Fix possible crash (or incorrect result) in case of LowCardinality arguments of window function. Fixes #31114. #31888 (Nikolai Kochetov).

Changes compared to the Community Build

  • Fix invalid cast of Nullable type when nullable primary key is used. (Nullable primary key is a discouraged feature - please do not use). This fixes #31075 #31823 Amos Bird).

Test Reports

Test Report for Altinity Stable Build v21.8.13.1.

21.8.12

Released 2021-10-21

Release Notes

ClickHouse release v21.8.12.29-altinitystable as compared to v21.8.11.1-altinitystable:

Performance Improvements
Bug Fixes
  • Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31337 (sunny).
Build/Testing/Packaging Improvements
  • Now all images for CI will be placed in the separate dockerhub repo. #28656 (alesapin).
Bug Fixes (user-visible misbehaviour in official stable or prestable release)
  • Fixed functions empty and notEmpty with the arguments of UUID type. Fixes #31819. #31883 (Anton Popov).
  • Fixed possible assertion ../src/IO/ReadBuffer.h:58: bool DB::ReadBuffer::next(): Assertion '!hasPendingData()' failed. in TSKV format. #31804 (Kruglov Pavel).
  • Fixed usage of Buffer table engine with type Map. Fixes #30546. #31742 (Anton Popov).
  • Fixed race in JSONEachRowWithProgress output format when data and lines with progress are mixed in output. #31736 (Kruglov Pavel).
  • Fixed there are no such cluster here error on execution of ON CLUSTER query if specified cluster name is name of Replicated database. #31723 (tavplubix).
  • Settings input_format_allow_errors_num and input_format_allow_errors_ratio did not work for parsing of domain types, such as IPv4, it’s fixed. Fixes #31686. #31697 (tavplubix).
  • RENAME TABLE query worked incorrectly on attempt to rename an DDL dictionary in Ordinary database, it’s fixed. #31638 (tavplubix).
  • Fix invalid generated JSON when only column names contain invalid UTF-8 sequences. #31534 (Kevin Michel).
  • Resolve nullptr in STS credentials provider for S3. #31409 (Vladimir Chebotarev).
  • Remove not like function into RPNElement. #31169 (sundyli).
  • Fixed bug in Keeper which can lead to inability to start when some coordination logs was lost and we have more fresh snapshot than our latest log. #31150 (alesapin).
  • Fixed abort in debug server and DB::Exception: std::out_of_range: basic_string error in release server in case of bad hdfs url by adding additional check of hdfs url structure. #31042 (Kruglov Pavel).

21.8.11

Released 2021-11-19

ClickHouse release v21.8.11.1-altinitystable FIXME as compared to v21.8.10.1-altinitystable

New Features
  • CompiledExpressionCache limit elements size using compiled_expression_cache_elements_size setting. #30667 (Maksim Kita).
Improvements
Bug Fixes
Bug Fixes (user-visible misbehaviour in official stable or prestable release)
  • Fixed StorageMerge with aliases and where (it did not work before at all). Closes #28802. #31044 (Kseniia Sumarokova).
  • Fixed JSONValue/Query with quoted identifiers. This allows to have spaces in json path. Closes #30971. #31003 (Kseniia Sumarokova).
  • Using formatRow function with not row formats led to segfault. Don’t allow to use this function with such formats (because it doesn’t make sense). #31001 (Kruglov Pavel).
  • Skip max_partition_size_to_drop check in case of ATTACH PARTITION ... FROM and MOVE PARTITION ... #30995 (Amr Alaa).
  • Fixed set index not used in AND/OR expressions when there are more than two operands. This fixes #30416 . #30887 (Amos Bird).
  • Fixed ambiguity when extracting auxiliary ZooKeeper name from ZooKeeper path in ReplicatedMergeTree. Previously server might fail to start with Unknown auxiliary ZooKeeper name if ZooKeeper path contains a colon. Fixes #29052. Also it was allowed to specify ZooKeeper path that does not start with slash, but now it’s deprecated and creation of new tables with such path is not allowed. Slashes and colons in auxiliary ZooKeeper names are not allowed too. #30822 (tavplubix).
  • Fixed a race condition between REPLACE/MOVE PARTITION and background merge in non-replicated MergeTree that might cause a part of moved/replaced data to remain in partition. Fixes #29327. #30717 (tavplubix).
  • Fixed PREWHERE with WHERE in case of always true PREWHERE. #30668 (Azat Khuzhin).
  • Functions for case-insensitive search in UTF8 strings like positionCaseInsensitiveUTF8 and countSubstringsCaseInsensitiveUTF8 might find substrings that actually does not match is fixed. #30663 (tavplubix).
  • Limit push down optimization could cause a error Cannot find column. Fixes #30438. #30562 (Nikolai Kochetov).
  • Fixed exception handling in parallel_view_processing. This resolves issues / prevents crashes in some rare corner cases when that feature is enabled and exception (like Memory limit exceeded ...) happened in the middle of materialized view processing. #30472 (filimonov).
  • Fixed segfault which might happen if session expired during execution of REPLACE PARTITION. #30432 (tavplubix).
  • Fixed ComplexKeyHashedDictionary, ComplexKeySparseHashedDictionary parsing preallocate option from layout config. #30246 (Maksim Kita).
  • Fixed [I]LIKE function. Closes #28661. #30244 (Nikolay Degterinsky).
  • Support nullable arguments in function initializeAggregation. #30177 (Anton Popov).
  • Fixed data-race between LogSink::writeMarks() and LogSource in StorageLog. #29946 (Azat Khuzhin).
  • Fixed hanging DDL queries on Replicated database while adding a new replica. #29328 (Kevin Michel).
  • Fixed bad optimizations of ORDER BY if it contains WITH FILL. This closes #28908. This closes #26049. #28910 (alexey-milovidov).
  • Fixed queries to external databases (i.e. MySQL) with multiple columns in IN ( i.e. (k,v) IN ((1, 2)) ) (but note that this has some backward incompatibility for the clickhouse-copier since it uses alias for tuple element). #28888 (Azat Khuzhin).
  • Fixed “Column is not under aggregate function and not in GROUP BY” with PREWHERE (Fixes: #28461). #28502 (Azat Khuzhin).
  • Fixed NOT-IN index optimization when not all key columns are used. This fixes #28120. #28315 (Amos Bird).
Bug Fixes (user-visible misbehaviour in official stable or prestable release
  • Fixed ORDER BY ... WITH FILL with set TO and FROM and no rows in result set. #30888 (Anton Popov).

21.8.10

Released 2021-10-23

Release notes for Altinity Stable Build 21.8.10

ClickHouse release 21.8.10 as compared to v21.8.8
Improvements
Bug Fixes
  • Fix shutdown of AccessControlManager. Now there can’t be reloading of the configuration after AccessControlManager has been destroyed. This PR fixes the flaky test test_user_directories/test.py::test_relative_path. #29951 (Vitaly Baranov).
  • Allow using a materialized column as the sharding key in a distributed table even if insert_allow_materialized_columns=0:. #28637 (Vitaly Baranov).
  • FlatDictionary, HashedDictionary fix bytes_allocated calculation for nullable attributes. #30238 (Maksim Kita).
  • Dropped Memory database might reappear after server restart, it’s fixed (#29795). Also added force_remove_data_recursively_on_drop setting as a workaround for Directory not empty error when dropping Ordinary database (because it’s not possible to remove data leftovers manually in cloud environment). #30054 (tavplubix).
  • Fix crash of sample by tuple(), closes #30004. #30016 (Flynn).
  • Fix possible data-race between FileChecker and StorageLog/StorageStripeLog. #29959 (Azat Khuzhin).
  • Fix system tables recreation check (fails to detect changes in enum values). #29857 (Azat Khuzhin).
  • Avoid Timeout exceeded: elapsed 18446744073.709553 seconds error that might happen in extremely rare cases, presumably due to some bug in kernel. Fixes #29154. #29811 (tavplubix).
  • Fix bad cast in ATTACH TABLE ... FROM 'path' query when non-string literal is used instead of path. It may lead to reading of uninitialized memory. #29790 (alexey-milovidov).
  • Fix concurrent access to LowCardinality during GROUP BY (leads to SIGSEGV). #29782 (Azat Khuzhin).
  • Fixed incorrect behaviour of setting materialized_postgresql_tables_list at server restart. Found in #28529. #29686 (Kseniia Sumarokova).
  • Condition in filter predicate could be lost after push-down optimisation. #29625 (Nikolai Kochetov).
  • Fix rare segfault in ALTER MODIFY query when using incorrect table identifier in DEFAULT expression like x.y.z... Fixes #29184. #29573 (alesapin).
  • Fix bug in check pathStartsWith because there was bug with the usage of std::mismatch: The behavior is undefined if the second range is shorter than the first range.. #29531 (Kseniia Sumarokova).
  • In ODBC bridge add retries for error Invalid cursor state. It is a retriable error. Closes #29473. #29518 (Kseniia Sumarokova).
  • Fix possible Block structure mismatch for subqueries with pushed-down HAVING predicate. Fixes #29010. #29475 (Nikolai Kochetov).
  • Avoid deadlocks when reading and writing on JOIN Engine tables at the same time. #30187 (Raúl Marín).
  • Fix INSERT SELECT incorrectly fills MATERIALIZED column based of Nullable column. #30189 (Azat Khuzhin).
  • Fix null deference for GROUP BY WITH TOTALS HAVING (when the column from HAVING wasn’t selected). #29553 (Azat Khuzhin).

V21.3 Build

Point releases:

Major new features since the previous stable release 21.1.x

Released 2021-06-29

A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:

  • SQL features:
    • Window functions! See our blog article for an introduction.
    • Support multiple levels nesting in the Nested datatype.
  • Security features:
    • LDAP roles mapping a).
    • Apply row-level security as a separate PREWHERE step a).
    • Server side keys support for S3 table function a)>.
  • Cluster improvements:
    • Hedged requests! That allows to reduce tail latencies on large clusters by running the same query at different replicas. This is controlled by use_hedged_requests and max_parallel_replicas settings.
    • Allow inserts into a specific shard via a distributed table. This is controlled by the insert_shard_id setting.
    • Allow inserts into cluster() table function and specify a sharding key.
    • Replicated database engine (experimental). Allows to replicate DDL statements across the cluster. It is not production ready in this release.
  • MergeTree features:
    • Table level concurrency control. Can be controlled by the max_concurrent_queries merge tree setting.
  • Integrations:
    • PostgreSQL table engine, table function and dictionary source.

As usual with ClickHouse, there are many performance and operational improvements in different server components.


a) Contributed by Altinity developers.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • It is no longer allowed to create MergeTree tables in the old syntax with table TTL; this syntax is now ignored. Attaching old tables is still possible.
  • Floating point columns and values are disallowed in:
    • Partitioning key – these can be turned back on with the allow_floating_point_partition_key merge tree setting.
    • Bitwise functions.
  • Excessive parenthesis in type definitions are no longer supported, example: Array((UInt8)).

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • Now replicas that are processing the ALTER TABLE ATTACH PART[ITION] command search their detached/ folders before fetching the data from other replicas. As an implementation detail, a new command ATTACH_PART is introduced in the replicated log. Parts are searched and compared by their checksums. ATTACH PART[ITION] queries may not work during cluster upgrade.

Other Important Changes

Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:

  • distributed_aggregation_memory_efficient
  • enable_global_with_statement
  • optimize_normalize_count_variants
  • optimize_respect_aliases
  • optimize_rewrite_sum_if_to_count_if

Known issues in 21.3.13.9

The development team continues to improve the quality of the 21.3 release. The following issues still exist in the 21.3.13.9 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications:

ClickHouse Altinity Stable releases are based on the community versions. Community builds can be downloaded from repo.clickhouse.tech.

ClickHouse Altinity builds may be found at:

Please contact us at info@altinity.com if you experience any issues with the upgrade.

Appendix

New functions
  • DateTime functions:
    • Compatibility with MySQL: DATE, DAY, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, HOUR, MINUTE, MONTH, QUARTER, SECOND, YEAR
    • parseDateTimeBestEffortUSOrNull
    • parseDateTimeBestEffortUSOrZero
    • timezoneOffset
  • String functions:
    • decodeXMLComponent, extractTextFromHTML
  • Map functions:
    • mapContains, mapKeys, mapValues
  • Aggregate functions
    • New combinator -SimpleState
    • deltaSum
    • runningConcurrency
  • Functions to work with new Map data type:
    • map
  • Math/Statistics:
    • sign
  • Type related:
    • reinterpret, reinterpretAsInt128, reinterpretAsInt256, reinterpretAsUInt256
  • Internal ClickHouse:
    • connectionID, connection_id
    • normalizeQueryKeepNames, normalizedQueryHashKeepNames
New table functions
  • postgresql
  • file
New table engines
  • PostgreSQL
New metrics and events

system.metrics.

  • PartsCommitted
  • PartsCompact
  • PartsDeleteOnDestroy
  • PartsDeleting
  • PartsInMemory
  • PartsOutdated
  • PartsPreCommitted
  • PartsTemporary
  • PartsWide

system.events:

  • DistributedDelayedInserts
  • DistributedDelayedInsertsMilliseconds
  • DistributedRejectedInserts
  • HedgedRequestsChangeReplica
New system tables
  • distributed_ddl_queue
New columns in system tables
  • metric_log.
    • ProfileEvent_DistributedDelayedInserts, ProfileEvent_DistributedRejectedInserts, ProfileEvent_DistributedDelayedInsertsMilliseconds, ProfileEvent_HedgedRequestsChangeReplica, CurrentMetric_PartsTemporary, CurrentMetric_PartsPreCommitted, CurrentMetric_PartsCommitted, CurrentMetric_PartsOutdated, CurrentMetric_PartsDeleting, CurrentMetric_PartsDeleteOnDestroy, CurrentMetric_PartsWide, CurrentMetric_PartsCompact, CurrentMetric_PartsInMemory
  • Parts:
    • group_by_ttl_info.expression, group_by_ttl_info.min, group_by_ttl_info.max, rows_where_ttl_info.expression, rows_where_ttl_info.min, rows_where_ttl_info.max
  • processes.http_referer
  • query_log:
    • http_referer, log_comment, used_aggregate_functions, used_aggregate_function_combinators, used_database_engines, used_data_type_families, used_dictionaries, used_formats, used_functions, used_storages, used_table_functions
  • query_thread_log.http_referer
  • quota_limits:
    • max_query_selects, max_query_inserts
  • quota_usage:
    • query_selects, max_query_selects, query_inserts, max_query_inserts
  • quotas_usage:
    • query_selects, max_query_selects, query_inserts, max_query_inserts
System.merge_tree_settings added/changed
Name Old value New value Description
allow_floating_point_partition_key 0 Allow floating point as partition key.
inactive_parts_to_delay_insert 0 If table contains at least that many inactive parts in single partition, artificially slow down insert into table.
inactive_parts_to_throw_insert 0 If more than this number of inactive parts are in a single partition, throw the ‘Too many inactive parts …’ exception.
max_concurrent_queries 0 Max number of concurrently executed queries related to the MergeTree table (0 - disabled). Queries will still be limited by other max_concurrent_queries settings.
min_marks_to_honor_max_concurrent_queries 0 Minimal number of marks to honor the MergeTree-level’s max_concurrent_queries (0 - disabled). Queries will still be limited by other max_concurrent_queries settings.
system.settings added/changed
Name Old value New value Description
allow_changing_replica_until_first_data_packet 0 Allow HedgedConnections to change the replica until receiving the first data packet.
allow_experimental_database_replicated 0 Allows creating databases with the Replicated engine.
allow_experimental_query_deduplication 0 Allows sending parts’ UUIDs for a query in order to deduplicate data parts, if any.
async_socket_for_remote 1 0 Asynchronously read from socket executing remote query.
background_fetches_pool_size 3 8 Number of threads performing background fetches for replicated tables. Only has meaning at server startup.
checksum_on_read 1 Validates checksums on reading. It is enabled by default and should be always enabled in production. Please do not expect any benefits in disabling this setting. It may only be used for experiments and benchmarks. These settings are only applicable for tables of the MergeTree family. Checksums are always validated for other table engines and when receiving data over the network.
database_replicated_ddl_output 1 Returns table with query execution status as a result of a DDL query.
database_replicated_initial_query_timeout_sec 300 How long an initial DDL query should wait for a Replicated database to process previous DDL queue entries.
distributed_aggregation_memory_efficient 0 1 The memory-saving mode of distributed aggregation is enabled.
enable_global_with_statement 0 1 Propagate WITH statements to UNION queries and all subqueries.
engine_file_empty_if_not_exists 0 Allows selecting data from a file engine table without a file.
engine_file_truncate_on_insert 0 Enables or disables truncate before insert in file engine tables.
flatten_nested 1 If true, columns of type Nested will be flatten to separate array columns instead of one array of tuples.
hedged_connection_timeout_ms 100 Connection timeout for establishing connection with replica for Hedged requests.
insert_in_memory_parts_timeout 600000 REMOVED.
insert_shard_id 0 If non zero, when inserting into a distributed table, the data will be inserted into the shard insert_shard_id synchronously. Possible values range from 1 to shards_number of the corresponding distributed table.
log_comment Log comment into system.query_log table and server log. It can be set to an arbitrary string no longer than max_query_size.
normalize_function_names 0 Normalize function names to their canonical names.
optimize_normalize_count_variants 1 Rewrite aggregate functions that semantically equals count() as count().
optimize_respect_aliases 1 If it is set to true, it will respect aliases in WHERE/GROUP BY/ORDER BY, that will help with partition pruning/secondary indexes/optimize_aggregation_in_order/optimize_read_in_order/optimize_trivial_count.
optimize_rewrite_sum_if_to_count_if 1 Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent.
periodic_live_view_refresh 60 Interval after which a periodically refreshed live view is forced to refresh.
query_plan_max_optimizations_to_apply 10000 Limit the total number of optimizations applied to the query plan. If zero, ignored. If the limit is reached, throw an exception.
receive_data_timeout_ms 2000 Connection timeout for receiving the first packet of data or packet with positive progress from replica.
s3_max_connections 1024 The maximum number of connections per server.
sleep_in_send_data 0 Time to sleep in sending data in TCPHandler.
sleep_in_send_tables_status 0 Time to sleep in sending tables the status response in TCPHandler.
unknown_packet_in_send_data 0 Sends unknown packet instead of N-th data packet.
use_hedged_requests 0 Use hedged requests for distributed queries.

Also, please refer to the release notes from the development team available at the following URLs:

21.3.20.2

Released 2022-02-10

Changes Compared to Altinity Stable Build 21.3.17.3

Bug Fixes
  • Integer overflow to resize the arrays causes heap corrupt. #33024 (Rajkumar Varada).
  • fix crash when used fuzzBits with multiply same FixedString, Close #32737. #32755 (SuperDJY).
  • Number of active replicas might be determined incorrectly when inserting with quorum if setting replicated_can_become_leader is disabled on some replicas. It’s fixed. #32157 (tavplubix).
  • Fix possible assertion ../src/IO/ReadBuffer.h:58: bool DB::ReadBuffer::next(): Assertion '!hasPendingData()' failed. in TSKV format. #31804 (Kruglov Pavel).
  • Fix crash when function dictGet with type is used for dictionary attribute when type is Nullable. Fixes #30980. #31800 (Maksim Kita).
  • Fix usage of Buffer table engine with type Map. Fixes #30546. #31742 (Anton Popov).
  • Fix race in JSONEachRowWithProgress output format when data and lines with progress are mixed in output. #31736 (Kruglov Pavel).
  • Settings input_format_allow_errors_num and input_format_allow_errors_ratio did not work for parsing of domain types, such as IPv4, it’s fixed. Fixes #31686. #31697 (tavplubix).
  • Remove not like function into RPNElement. #31169 (sundyli).
  • Using formatRow function with not row formats led to segfault. Don’t allow to use this function with such formats (because it doesn’t make sense). #31001 (Kruglov Pavel).
  • Functions for case-insensitive search in UTF8 strings like positionCaseInsensitiveUTF8 and countSubstringsCaseInsensitiveUTF8 might find substrings that actually does not match, it’s fixed. #30663 (tavplubix).
  • Fixed segfault which might happen if session expired during execution of REPLACE PARTITION. #30432 (tavplubix).
  • Fix [I]LIKE function. Closes #28661. #30244 (Nikolay Degterinsky).
  • FlatDictionary, HashedDictionary fix bytes_allocated calculation for nullable attributes. #30238 (Maksim Kita).
  • Support nullable arguments in function initializeAggregation. #30177 (Anton Popov).
  • Fix crash of sample by tuple(), closes #30004. #30016 (flynn).
  • Fix concurrent access to LowCardinality during GROUP BY (leads to SIGSEGV). #29782 (Azat Khuzhin).
  • Condition in filter predicate could be lost after push-down optimisation. #29625 (Nikolai Kochetov).
  • Fix null deference for GROUP BY WITH TOTALS HAVING (when the column from HAVING wasn’t selected). #29553 (Azat Khuzhin).
  • Fix connection timeouts (send_timeout/receive_timeout). #29282 (Azat Khuzhin).
  • Fix possible Table columns structure in ZooKeeper is different from local table structure exception while recreating or creating new replicas of ReplicatedMergeTree, when one of table columns have default expressions with case-insensitive functions. #29266 (Anton Popov).
  • Fix segfault while inserting into column with type LowCardinality(Nullable) in Avro input format. #29132 (Kruglov Pavel).
  • Fix the number of threads used in GLOBAL IN subquery (it was executed in single threads since #19414 bugfix). #28997 (Nikolai Kochetov).
  • Fix invalid constant type conversion when nullable or lowcardinality primary key is used. #28636 (Amos Bird).
  • Fix ORDER BY ... WITH FILL with set TO and FROM and no rows in result set. #30888 (Anton Popov).
  • Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee).
  • Fix null pointer dereference in low cardinality data when deserializing LowCardinality data in the Native format. #33021 (Harry Lee).
  • Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31656 (sunny).
  • Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31337 (sunny).
  • Fix shutdown of AccessControlManager. Now there can’t be reloading of the configuration after AccessControlManager has been destroyed. This PR fixes the flaky test test_user_directories/test.py::test_relative_path. #29951 (Vitaly Baranov).
Performance Improvements
Other Improvements
  • Updated zoneinfo files to 2021c. #29925 (alexey-milovidov).
  • Use real tmp file instead of predefined “rows_sources” for vertical merges. This avoids generating garbage directories in tmp disks. #28299 (Amos Bird).
  • Use separate clickhouse-bridge group and user for bridge processes. Set oom_score_adj so the bridges will be first subjects for OOM killer. Set set maximum RSS to 1 GiB. Closes #23861. #25280 (Kseniia Sumarokova).

Changes compared to Community Build 21.3.20.1-lts

Test Results

21.3.17.3

Released 2021-10-14

Changes Since v21.3.17.2-lts

Changes since the Long Term Support release of ClickHouse v21.3.17.2-lts to Altinity Stable 21.3.17.3:

V21.1 Build

Point release:

Major new features since the previous stable release 20.8.x

Released 2021-04-13

A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:

  • Security features:
    • LDAP external users directory, see an article in out blog a)
    • AES Encryption functions, see an article in our blog for more detail a)
    • Migration from OpenSSL to BoringSSL library
    • Support SNI in https connections to remote resources
    • Support SNI in ClickHouse server TLS endpoint a)
    • Security context propagation in distributed queries
  • MergeTree features:
    • New TTL extension: TTL RECOMPRESS.
    • ALTER UPDATE/DELETE IN PARTITION for replicated MergeTree tables a)
    • DETACH TABLE/VIEW PERMANENTLY a)
    • [OPTIMIZE DEDUPLICATE BY](https://clickhouse.tech/docs/en/sql-reference/statements/optimize/#by-expression) – deduplicate MergeTree tables by a subset of columns a)
    • SimpleAggregateFunction in SummingMergeTree
    • Option to disable merges for a cold storage in tiered storage configuration a)
  • Integrations:
    • gRPC protocol
    • zstd and xz compression for file-based engines
    • EmbeddedRocksDB engine
  • SQL compatibility:
    • UNION DISTINCT (previously only UNION ALL was supported). The default can be altered by union_default_mode setting.
    • Improved CTE compatibility
    • REPLACE TABLE and CREATE OR REPLACE TABLE DDL statements for Atomic database engine.
  • Other:

As usual with ClickHouse, there are many performance and operational improvements in different server components.


a) Contributed by Altinity developers.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • Atomic database engine is enabled by default. It does not affect existing databases but new databases will be created with Engine = Atomic. The engine can not be modified for the database once created. Database Atomic has been used for system tables since 20.5, and it is a good feature in the long term. We recommend disabling it for now, however, especially if you use some backup tools, including ClickHouse Backup 0.6.4 or earlier. The data layout on the storage has been changed. In order to disable it by default, add a following configuration section for a default profile:

    <yandex>
        <profiles>
            <default>
                <default_database_engine>Ordinary</default_database_engine>
            </default>
        </profiles>
    </yandex>
    
  • toUUID(N) is no longer supported. If there is a DEFAULT column with this expression ClickHouse won’t start.

  • Following functions where removed: sumburConsistentHash, timeSeriesGroupSum, timeSeriesGroupRateSum.

  • avg and avgWeighted functions now always return Float64. In previous versions they returned Decimal for Decimal arguments.

  • Accept user settings related to file formats (e.g. format_csv_delimiter) in the SETTINGS clause when creating a table that uses File engine, and use these settings in all INSERTs and SELECTs. Session level settings are ignored in this case.

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • Replication protocol has been changed in 20.10 in order to improve reliability of TTL merges. Replication between versions prior to 20.10 and 20.10+ is incompatible if ReplicatedMergeTree tables with TTL are used. See https://github.com/ClickHouse/ClickHouse/pull/14490 for more information.
    • For a safe upgrade all replicas should be upgraded at once.
    • Alternatively, SYSTEM STOP TTL MERGES should be used during the upgrade.

Other Important Changes

Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:

  • Insert quorum behaviour has been changed. insert_quorum_parallel is enabled by default. It breaks sequential consistency and may have other side effects. We recommend disabling this feature if you are using quorum inserts on previous versions.
  • optimize_on_insert is enabled by default. This is a new feature that applies a logic of Replacing, Summing, Collapsing and AggregatingMergeTree on the inserted block. Unfortunately, it still has some issues so we recommend disabling this after upgrading.
  • use_compact_format_in_distributed_parts_names is enabled by default.
  • input_format_null_as_default is enabled by default.
  • Background fetches are now limited by background_fetches_pool_size setting. The default value is 3 that may be low in some cases. In previous versions the common background pool has been used for merges and fetches with the default size 16.
  • Compact MergeTree parts are enabled by default for parts below 10MB of size uncompressed. See min_bytes_for_wide_part setting.

Known issues in 21.1.7.1

Development team continues to improve the quality of the 21.1 release. Following issues still exist in the 21.1.7.1 version and may affect ClickHouse operation. Please inspect them carefully in order to decide if those are applicable to your applications:

ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from the Altinity Stable Repository.

Please contact us at info@altinity.com if you experience any issues with the upgrade.


Appendix

New data types
  • DateTime32 (alias to DateTime)
  • Map (Experimental)
New formats
  • JSONCompactStrings
  • JSONCompactStringsEachRow
  • JSONCompactStringsEachRowWithNamesAndTypes
  • JSONStrings
  • JSONStringsEachRow
  • JSONStringsEachRowWithProgress
  • LineAsString
  • ProtobufSingle
  • RawBLOB
New functions
  • DateTime functions:
    • fromModifiedJulianDay, toModifiedJulianDay
    • fromModifiedJulianDayOrNull, toModifiedJulianDayOrNull
    • parseDateTime32BestEffort, parseDateTime32BestEffortOrNull, parseDateTime32BestEffortOrZero, toDateTime32
  • String functions:
    • countMatches, countMatchesCaseInsensitive
    • countSubstrings, countSubstringsCaseInsensitive, countSubstringsCaseInsensitiveUTF8
    • isIPv4String, isIPv6String
  • Functions for semi-duplicate search and strings proximity:
    • bitHammingDistance, tupleHammingDistance
    • ngramMinHash*, ngramSimHash*
  • Encoding/Formatting:
    • encodeXMLComponent
    • formatReadableQuantity
    • formatReadableTimeDelta
  • Array functions:
    • arrayAvg, arrayMax, arrayMin
    • mapPopulateSeries
  • Machine-learning and statistics:
    • mannWhitneyUTest
    • rankCorr
    • hypot
    • studentTTest, welchTTest
  • Encryption:
    • aes_decrypt_mysql, aes_encrypt_mysql
    • dencrypt, encrypt
  • URL functions
    • cutToFirstSignificantSubdomainCustom, cutToFirstSignificantSubdomainCustomWithWWW, cutToFirstSignificantSubdomainWithWWW
    • firstSignificantSubdomainCustom
  • Aggregate functions
    • New combinator -SimpleState
  • Functions to work with new Map data type:
    • map
  • Math/Statistics:
    • acosh, asinh, atan2, atanh, cosh, sinh
    • log1p
  • Type related:
    • accurateCast
    • accurateCastOrNull
    • byteSize
    • reinterpretAsUUID
    • toUUIDOrNull
    • toUUIDOrZero
  • Other:
    • farmFingerprint64
  • Internal ClickHouse:
    • errorCodeToName
    • logTrace
    • tid
    • tcpPort
New table functions
  • null
  • view
New table engines
  • EmbeddedRocksDB
New metrics and events

system.metrics:

  • BackgroundFetchesPoolTask
  • BackgroundMessageBrokerSchedulePoolTask
  • MaxDDLEntryID
  • TablesToDropQueueSize
  • REMOVED MemoryTrackingForMerges
  • REMOVED MemoryTrackingInBackgroundBufferFlushSchedulePool
  • REMOVED MemoryTrackingInBackgroundDistributedSchedulePool
  • REMOVED MemoryTrackingInBackgroundMoveProcessingPool
  • REMOVED MemoryTrackingInBackgroundProcessingPool
  • REMOVED MemoryTrackingInBackgroundSchedulePool

system.asynchronous_metrics:

  • HTTPThreads
  • InterserverThreads
  • MySQLThreads
  • TCPThreads
  • TotalBytesOfMergeTreeTables
  • TotalPartsOfMergeTreeTables
  • TotalRowsOfMergeTreeTables

system.events:

  • RWLockReadersWaitMilliseconds
  • REMOVED CreatedWriteBufferOrdinary
New system tables
  • errors
  • replicated_fetches
  • replicated_merge_tree_settings
New columns in system tables
  • grants.access_type
  • merges.merge_type, merge_algorithm
  • parts.uuid, default_compression_codec, recompression_ttl_info.expression, recompression_ttl_info.min, recompression_ttl_info.max
  • privileges.privilege, parent_group
  • processes.forwarded_for
  • query_log.event_time_microseconds, query_start_time_microseconds, normalized_query_hash, query_kind, databases, tables, columns, forwarded_for
  • query_thread_log.event_time_microseconds, query_start_time_microseconds, normalized_query_hash, current_database, forwarded_for
  • quotas.keys
  • replication_queue.merge_type
  • storage_policies.prefer_not_to_merge
  • table_engines.supports_parallel_insert
  • trace_log.event_time_microseconds
  • user_directories.path, readonly, params
system.merge_tree_settings added/changed
Name Old value New value Description
assign_part_uuids 0 Generate UUIDs for parts. Before enabling check that all replicas support the new format.
execute_merges_on_single_replica_time_threshold 0 When greater than zero only a single replica starts the merge immediately, others wait up to that amount of time to download the result instead of doing merges locally. If the chosen replica doesn’t finish the merge during that amount of time, fallback to standard behavior happens.
fsync_after_insert 0 Do fsync for every inserted part. Significantly decreases performance of inserts, not recommended to use with wide parts.
fsync_part_directory 0 Do fsync for part directory after all part operations (writes, renames, etc.).
in_memory_parts_insert_sync 0 If true insert of part with in-memory format will wait for fsync of WAL
max_compress_block_size 0 Compress the pending uncompressed data in a buffer if its size is larger or equal than the specified threshold. Block of data will be compressed even if the current granule is not finished. If this setting is not set, the corresponding global setting is used.
max_number_of_merges_with_ttl_in_pool 2 When there is more than a specified number of merges with TTL entries in the pool, do not assign a new merge with TTL. This is to leave free threads for regular merges and avoid “Too many parts”
max_partitions_to_read -1 Limit the max number of partitions that can be accessed in one query. <= 0 means unlimited. This setting is the default that can be overridden by the query-level setting with the same name.
max_replicated_merges_with_ttl_in_queue 1 How many tasks of merging parts with TTL are allowed simultaneously in the ReplicatedMergeTree queue.
merge_with_recompression_ttl_timeout 14400 Minimal time in seconds, when merge with recompression TTL can be repeated.
merge_with_ttl_timeout 86400 14400 Minimal time in seconds, when merge with delete TTL can be repeated.
min_bytes_for_wide_part 0 10485760 Minimal uncompressed size in bytes to create part in wide format instead of compact
min_compress_block_size 0 When granule is written, compress the data in a buffer if the size of pending uncompressed data is larger or equal than the specified threshold. If this setting is not set, the corresponding global setting is used.
min_compressed_bytes_to_fsync_after_fetch 0 Minimal number of compressed bytes to do fsync for part after fetch (0 - disabled)
min_compressed_bytes_to_fsync_after_merge 0 Minimal number of compressed bytes to do fsync for part after merge (0 - disabled)
min_rows_to_fsync_after_merge 0 Minimal number of rows to do fsync for part after merge (0 - disabled)
remove_empty_parts 1 Remove empty parts after they were pruned by TTL, mutation, or collapsing merge algorithm
try_fetch_recompressed_part_timeout 7200 Recompression works slowly in most cases. We don’t start a merge with recompression until this timeout and try to fetch the recompressed part from the replica which assigned this merge with recompression.
write_ahead_log_bytes_to_fsync 104857600 Amount of bytes, accumulated in WAL to do fsync.
write_ahead_log_interval_ms_to_fsync 100 Interval in milliseconds after which fsync for WAL is being done.
system.settings added/changed
Name Old value New value Description
aggregate_functions_null_for_empty 0 Rewrite all aggregate functions in a query, adding -OrNull suffix to them
allow_experimental_cross_to_join_conversion 1
allow_experimental_data_skipping_indices 1
allow_experimental_low_cardinality_type 1
allow_experimental_map_type 0 Allow data type Map
allow_experimental_multiple_joins_emulation 1
allow_experimental_window_functions 0 Allow experimental window functions
asterisk_include_alias_columns 0 Include ALIAS columns for wildcard query
asterisk_include_materialized_columns 0 Include MATERIALIZED columns for wildcard query
async_socket_for_remote 1 Asynchronously read from socket executing remote query
background_fetches_pool_size 3 Number of threads performing background fetches for replicated tables. Only has meaning at server startup.
background_message_broker_schedule_pool_size 16 Number of threads performing background tasks for message streaming. Only has meaning at server startup.
compile 0
database_atomic_wait_for_drop_and_detach_synchronously 0 When executing DROP or DETACH TABLE in Atomic database, wait for table data to be finally dropped or detached.
date_time_output_format simple Method to write DateTime to text output. Possible values: ‘simple’, ‘iso’, ‘unix_timestamp’.
default_database_engine Ordinary Atomic Default database engine.
do_not_merge_across_partitions_select_final 0 Merge parts only in one partition in select final
enable_global_with_statement 0 Propagate WITH statements to UNION queries and all subqueries
experimental_use_processors 1
force_data_skipping_indices Comma separated list of strings or literals with the name of the data skipping indices that should be used during query execution, otherwise an exception will be thrown.
force_optimize_skip_unused_shards_no_nested 0
format_regexp_escaping_rule Escaped Raw Field escaping rule (for Regexp format)
input_format_csv_arrays_as_nested_csv 0 When reading Array from CSV, expect that its elements were serialized in nested CSV and then put into string. Example: “[““Hello””, ““world””, ““42"””” TV""]". Braces around an array can be omitted.
input_format_csv_enum_as_number 0 Treat inserted enum values in CSV formats as enum indices
input_format_null_as_default 0 1 For text input formats initialize null fields with default values if data type of this field is not nullable
input_format_tsv_enum_as_number 0 Treat inserted enum values in TSV formats as enum indices
insert_distributed_one_random_shard 0 If setting is enabled, inserting into distributed table will choose a random shard to write when there is no sharding key
insert_quorum_parallel 1 For quorum INSERT queries - enable to make parallel inserts without linearizability
limit 0 Limit on read rows from the most ’end’ result for select query, default 0 means no limit length
load_balancing_first_offset 0 Which replica to preferably send a query when FIRST_OR_RANDOM load balancing strategy is used.
log_queries_min_query_duration_ms 0 Minimal time for the query to run, to get to the query_log/query_thread_log.
mark_cache_min_lifetime 0
max_bytes_to_read_leaf 0 Limit on read bytes (after decompression) on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node.
max_concurrent_queries_for_all_users 0 The maximum number of concurrent requests for all users.
max_partitions_to_read -1 Limit the max number of partitions that can be accessed in one query. <= 0 means unlimited.
max_rows_to_read_leaf 0 Limit on read rows on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node.
merge_tree_uniform_read_distribution 1
min_count_to_compile 0
multiple_joins_rewriter_version 2 0 Obsolete setting, does nothing. Will be removed after 2021-03-31
mysql_datatypes_support_level Which MySQL types should be converted to corresponding ClickHouse types (rather than being represented as String). Can be empty or any combination of ‘decimal’ or ‘datetime64’. When empty MySQL’s DECIMAL and DATETIME/TIMESTAMP with non-zero precision are seen as String on ClickHouse’s side.
offset 0 Offset on read rows from the most ’end’ result for select query
opentelemetry_start_trace_probability 0 Probability to start an OpenTelemetry trace for an incoming query.
optimize_move_functions_out_of_any 1 0 Move functions out of aggregate functions ‘any’, ‘anyLast’.
optimize_on_insert 1 Do the same transformation for inserted block of data as if merge was done on this block.
optimize_skip_merged_partitions 0 Skip partitions with one part with level > 0 in optimize final
output_format_json_array_of_rows 0 Output a JSON array of all rows in JSONEachRow(Compact) format.
output_format_json_named_tuples_as_objects 0 Serialize named tuple columns as JSON objects.
output_format_parallel_formatting 1 Enable parallel formatting for some data formats.
output_format_pretty_row_numbers 0 Add row numbers before each row for pretty output format
output_format_tsv_null_representation Custom NULL representation in TSV format
partial_merge_join 0
read_backoff_min_concurrency 1 Settings to try keeping the minimal number of threads in case of slow reads.
read_overflow_mode_leaf throw What to do when the leaf limit is exceeded.
remerge_sort_lowered_memory_bytes_ratio 2 If memory usage after remerge does not reduced by this ratio, remerge will be disabled.
s3_max_redirects 10 Max number of S3 redirects hops allowed.
s3_max_single_part_upload_size 67108864 The maximum size of object to upload using singlepart upload to S3.
special_sort not_specified
system_events_show_zero_values 0 Include all metrics, even with zero values
union_default_mode Set default Union Mode in SelectWithUnion query. Possible values: empty string, ‘ALL’, ‘DISTINCT’. If empty, query without Union Mode will throw exception.
use_antlr_parser 0 Parse incoming queries using ANTLR-generated experimental parser
use_compact_format_in_distributed_parts_names 0 1 Changes format of directories names for distributed table insert parts.

Also, please refer to the release notes from the development team available at the following URLs:

21.1.11.3

Released 2022-06-01

This is a security update to 21.1.10.3 release. It backports fixes for several CVEs into 21.1 version:

As we discussed in a recent Altinity Blog article, JFrog discovered these CVEs and reported them to the Yandex team in 2021. The CVEs have been already included in 21.3.19 and 21.8.11 and Altinity.Stable and ClickHouse builds and all ClickHouse builds after 21.10.

We have now backported the fixes to 21.1 as part of our long-term maintenance policy for Altinity Stable builds. We recommend all users running ClickHouse 21.1 upgrade to this new release.

21.1.10.3

Released 2021-06-16

Changes between 21.1.10.3 and 21.1.9.41

The complete diff between Altinity and community versions can also be viewed at github:

https://github.com/ClickHouse/ClickHouse/compare/v21.1.9.41-stable&hellip;Altinity:v21.1.10.3-altinity+stable

Changes between ClickHouse Altinity Stable Release 21.1.9.41 and ClickHouse Altinity Stable Release 21.1.7.1

  • Fixed a bug with distributed queries that could sometimes fail with ‘Unknown packet n from server’ error message: https://github.com/ClickHouse/ClickHouse/issues/21588

  • Fixed several issues with mutations handling (e.g. https://github.com/ClickHouse/ClickHouse/issues/22013)

  • Fix a rare bug when quorum insert with iinsert_quorum_parallel=1 is not really a “quorum” because of deduplication

  • Fixed a possible buffer overflow in token bloom filter index: https://github.com/ClickHouse/ClickHouse/issues/19233

  • Fixed a bug when nullable types sometimes could not be retrieved: https://github.com/ClickHouse/ClickHouse/issues/21116

  • Fixed a bug with replicated tables metadata not properly cleaned from the non-primary ZooKeeper instance: https://github.com/ClickHouse/ClickHouse/issues/21054

  • Added parseDateTimeBestEffortOrZero to allow for empty values, while parseDateTimeBestEffort will not work with empty values. For example:

    SELECT parseDateTimeBestEffort('')
    
    Query id: 1c60717e-116b-445f-b675-7b9cac35c262
    
    
    0 rows in set. Elapsed: 0.003 sec.
    
    Received exception from server (version 21.11.1):
    Code: 41. DB::Exception: Received from localhost:9000. DB::Exception: Cannot read DateTime: neither Date nor Time was parsed successfully: While processing parseDateTimeBestEffort(''). (CANNOT_PARSE_DATETIME)
    
    SELECT parseDateTimeBestEffortOrZero('')
    
    Query id: 484baec9-6d00-4c66-a792-85a270d6f2f2
    
    ┌─parseDateTimeBestEffortOrZero('')─┐
                   1970-01-01 03:00:00 
    └───────────────────────────────────┘
    

ClickHouse Altinity Stable packages for this release can be found at:

21.1.9.41

Released 2021-04-13

Changes between 21.1.9.41 and 21.1.7.1

V20.8 Release

Point releases:

Altinity Stable Release for ClickHouse 20.8.7.15

Released 2020-12-02

Description

This release provides several changes and new features from the previous release.

Major New Features

  • Security features:
    • RBAC and SQL management for users, roles, grants etc.
    • LDAP authentication
    • Kerberos authentication in Kafka Engine
  • Dictionary features:
    • direct and ssd_cache layouts
    • Redis, MongoDB, Cassandra sources
    • Automatic query rewrite for joins (see an example here)
    • Experimental polygon dictionaries
  • MergeTree features:
    • S3 disk for MergeTree tables! It is still an experimental feature, see our article for more detail.
    • New TTL types: TTL DELETE WHERE and TTL GROUP BY.
    • In-memory MergeTree parts
  • Integrations:
    • PostgreSQL wire protocol
    • RabbitMQ and MongoDB storage engines
    • Tencent Cloud Object Storage (COS)
    • Highly experimental MaterializeMySQL engine that implements MySQL replica in ClickHouse
  • SQL compatibility:
    • New Int128, (U)Int256, Decimal256 extended precision data types
    • Aliases for standard SQL types
    • EXPLAIN statement!
    • Merge join improvements
  • Custom HTTP handlers
  • clickhouse-copier underwent extensive updates and improvements

Upgrade Notes

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • Aggregate functions states with Nullable arguments may produce different / incompatible types.
  • Gorilla, Delta and DoubleDelta codecs can not be used anymore on data types of variable size (like strings).
  • System tables (e.g. system.query_log, system.trace_log, system.metric_log) are using compact data part format for parts smaller than 10 MiB in size (this is almost always the case). Compact data part format is supported since version 20.3.
  • WARNING: If you have to downgrade to version prior 20.3, you should manually delete table data for system logs in /var/lib/clickhouse/data/system/.
  • The setting input_format_with_names_use_header is enabled by default. It will affect parsing of input formats -WithNames and -WithNamesAndTypes.
  • Deprecate special printing of zero Date/DateTime values as '0000-00-00’ and '0000-00-00 00:00:00’. Now it is printed as ‘1970-01-01’ and '1970-01-01 00:00:00’ respectively.

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended. Note the following:

  • 20.3 two-level aggregation is not compatible with 20.4+.
    Data will not be fully aggregated for queries that are processed using the two-level aggregation algorithm. This algorithm should be disabled before upgrading if the risk is high in your environment. See group_by_two_level_threshold and group_by_two_level_threshold_bytes settings.
  • zstd library has been updated in 20.5. While it is not a problem for ClickHouse in general, it may result in inconsistent parts when several replicas merge parts independently, and will force ClickHouse to download merged parts to make sure they are byte-identical (which will lead to extra traffic between nodes). The first node to complete the merge will register the part in ZooKeeper, and the other nodes will download the part if their checksums are different. There will be no data loss; conflicts will disappear once all replicas are upgraded.
  • The following settings lead to incompatibility in distributed queries when only a subset of shards are upgraded and others are not:
    • optimize_move_functions_out_of_any
    • optimize_arithmetic_operations_in_aggregate_functions
    • optimize_injective_functions_inside_uniq
    • optimize_if_transform_strings_to_enum
  • When pre-20.5 and 20.5+ versions run as replicas “Part ... intersects previous part" errors are possible due to change in leadership selection protocol. If you need to run pre-20.5 and 20.5+ versions in the same cluster make sure the old version can not become a leader. This can be configured via replicated_can_become_leader merge tree setting globally or on a table level.

Changes

  • All replicas are now ’leaders’. This allows multiple replicas to assign merges, mutations, partition drop, move and replace concurrently. Now system.replicas.is_leader is 1 for all tables on all nodes. If you rely on this value for some operations, your processes must be revised. The LeaderElection and LeaderReplica metrics were removed.
  • New setting max_server_memory_usage limits total memory usage of the server. The setting max_memory_usage_for_all_queries is now obsolete and does nothing. You might see an exception ‘Memory limit (total) exceeded‘.Increasing the limit requires a restart.
  • The log_queries setting is now enabled by default. You might want to disable this setting for some profiles if you don’t want their queries logged into the system.query_log table.
  • Several new optimizations are enabled by default. While they typically improve performance sometimes regressions are possible in corner cases:
    • optimize_aggregators_of_group_by_keys
    • optimize_arithmetic_operations_in_aggregate_functions
    • optimize_duplicate_order_by_and_distinct
    • optimize_group_by_function_keys
    • optimize_injective_functions_inside_uniq
    • optimize_move_functions_out_of_any \ optimize_monotonous_functions_in_order_by
    • optimize_redundant_functions_in_order_by
    • optimize_trivial_insert_select
    • partial_merge_join_optimizations
New data types
  • All standard SQL data types, like BIGINT, VARCHAR, etc. are mapped to ClickHouse data types. See the system.data_type_families table.
  • Experimental data types Point, Ring, Polygon, MultiPolygon
  • Int128, Int256, UInt256, Decimal256 extended precision data types
New formats
  • Arrow
  • ArrowStream
  • JSONAsString – this one allows to store the full JSON row unparsed and process later using ClickHouse JSONExtract functions
  • Markdown
  • MsgPack
  • PosgreSQLWire
  • Regexp - allows to parse any non-standard textish formats (for example logs) by applying regular expression to every line.
New functions
  • DateTime functions:
    • fromUnixTimestamp / FROM_UNIXTIME
    • dateTrunc / date_trunc
    • fromUnixTimestamp64Micro
    • fromUnixTimestamp64Milli
    • fromUnixTimestamp64Nano
    • toUnixTimestamp64Micro
    • toUnixTimestamp64Milli
    • toUnixTimestamp64Nano
  • String functions:
    • extractGroups, extractAllGroupsHorizontal, extractAllGroupsVertical (alias to extractAllGroups)
    • Ilike, notILike (also new SQL operator ILIKE has been added)
  • Array functions:
    • arrayReduceInRanges
    • hasSubstr
  • Machine-learning and statistics:
    • arrayAUC -- area Under the ROC Curve
    • bayesAB -- bayesian A/B Testing Calculator
    • medianExactHigh, medianExactLow
    • quantileExactHigh, quantileExactLow, quantilesExactHigh, quantilesExactLow
  • JSON
    • JSONExtractKeysAndValuesRaw
  • URL functions
    • port
    • netloc
  • Aggregate functions
    • -Distinct – new aggregate function combinator
    • initializeAggregation– allow initialization of AggregateFunction state
  • Functions to work with key-value pairs (aka maps):
    • mapAdd, mapSubtract, maxMap, minMap (extending the API of sumMap).
    • SimpleAggregateFunction now supportsminMap, maxMap, sumMap.
    • The new data type Map is currently in development that should make it even more intuitive.
  • MySQL integration
    • DATABASE (alias for currentDatabase)
    • globalVariable (stub)
  • New types related:
    • toDecimal256, toDecimal256OrNull, toDecimal256OrZero
    • toInt128, toInt128OrNull, toInt128OrZero
    • toInt256, toInt256OrNull, toInt256OrZero
    • toUInt256, toUInt256OrNull, toUInt256OrZero
  • engine=Join
    • joinGetOrNull
  • Random:
    • fuzzBits (used for CI)
    • rand32 (alias for rand)
    • randomFixedString, randomString, randomStringUTF8
  • Serialize columns to some text format
    • formatRow, formatRowNoNewline
  • Settings & custom settings:
    • getSetting
  • Check types
    • isDecimalOverflow
    • defaultValueOfTypeName
    • isZeroOrNull
  • Helper functions to analyze query_log
    • normalizeQuery, normalizedQueryHash
  • Other:
    • countDigits
    • mod (alias for modulo)
  • Special (used in CI / by ClickHouse developers):
    • hasThreadFuzzer
    • buildId
  • Related to internals of execution of of IN / GLOBAL IN operator (also see transform_null_in setting)
    • notNullIn, notNullInIgnoreSet, nullIn, nullInIgnoreSet, inIgnoreSet, notInIgnoreSet
    • globalNotNullIn, globalNotNullInIgnoreSet, globalNullIn, globalNullInIgnoreSet, globalInIgnoreSet, globalNotInIgnoreSet
New table functions
  • cosn – integration with Tencent Cloud Object Storage (COS)
New table engines
  • MongoDB
  • RabbitMQ
  • COSN
New metrics and events

system.metrics:

  • BackgroundBufferFlushSchedulePoolTask
  • BackgroundDistributedSchedulePoolTask
  • MemoryTrackingInBackgroundBufferFlushSchedulePool
  • MemoryTrackingInBackgroundDistributedSchedulePool
  • PostgreSQLConnection

system.events:

  • OSCPUVirtualTimeMicroseconds
  • OSCPUWaitMicroseconds
  • OSReadBytes, OSReadChars
  • OSWriteBytes, OSWriteChars
  • QueryTimeMicroseconds
New system tables (mostly for RBAC introspection)
  • current_roles
  • distribution_queue
  • enabled_roles
  • grants
  • licenses
  • privileges
  • quota_limits
  • quotas_usage
  • row_policies
  • role_grants
  • roles
  • settings_profile_elements
  • settings_profiles
  • time_zones
  • user_directories
  • users
New columns in system tables
  • columns.position, parts_columns.position
  • databases.uuid, dictionaries.uuid, tables.uuid
  • disks.type
  • merge_tree_settings.type, settings.type
  • parts: delete_ttl_info_min, delete_ttl_info_max, move_ttl_info_min, move_ttl_info_max, move_ttl_info.expression
  • query_log.current_database
  • storage_policies.volume_type
  • tables: total_rows, total_bytes, lifetime_rows, lifetime_bytes
  • trace_log: trace_type, size, timestamp_ns
system.merge_tree_settings added/changed
Setting Old value New value Description
allow_nullable_key 0 Allow Nullable types as primary keys.
always_fetch_merged_part 0 If true, replicas never merge parts and always download merged parts from other replicas.
disable_background_merges 0 REMOVED
enable_mixed_granularity_parts 0 1 Enable parts with adaptive and non-adaptive granularity at the same time
in_memory_parts_enable_wal 1 Whether to write blocks in Native format to write-ahead-log before creation in-memory part
lock_acquire_timeout_for_background_operations 120 For background operations like merges, mutations etc. How many seconds before failing to acquire table locks.
max_part_loading_threads auto max_part_loading_threads
max_part_removal_threads auto The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).
max_replicated_logs_to_keep 10000 1000 How many records may be in log, if there is an inactive replica.
min_replicated_logs_to_keep 100 10 Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn’t affect work of tables: used only to diagnose ZooKeeper log before cleaning.
min_bytes_for_compact_part 0 Minimal uncompressed size in bytes to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_rows_for_compact_part 0 Minimal number of rows to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_index_granularity_bytes 1024 Minimum amount of bytes in single granule.
min_relative_delay_to_measure 120 Calculate relative replica delay only if absolute delay is not less than this value.
write_ahead_log_max_bytes 1073741824 Rotate WAL, if it exceeds that amount of bytes
system.settings added/changed
Setting Old value New value Description
allow_experimental_bigint_types 0 Allow Int128, Int256, UInt256 and Decimal256 types
allow_experimental_database_materialize_mysql 1 Allow database creation with Engine=MaterializeMySQL(…) (Highly experimental yet)
allow_experimental_geo_types 0 Allow geo data types such as Point, Ring, Polygon, MultiPolygon
allow_non_metadata_alters 1 Allow to execute alters which affects table’s metadata and data on disk.
allow_push_predicate_when_subquery_contains_with 1 Allows push predicate when subquery contains WITH clause
allow_suspicious_codecs 0 If it is set to true, allow specifying meaningless compression codecs.
alter_partition_verbose_result 0 Output information about affected parts. Currently works only for FREEZE and ATTACH commands.
background_buffer_flush_schedule_pool_size 16 Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup.
background_distributed_schedule_pool_size 16 Number of threads performing background tasks for distributed sends. Only has meaning at server startup.
cast_keep_nullable 0 CAST operator keep Nullable for result data type
data_type_default_nullable 0 Data types without NULL or NOT NULL will make Nullable
default_database_engine Ordinary Default database engine
distributed_replica_max_ignored_errors 0 Number of errors that will be ignored while choosing replicas
force_optimize_skip_unused_shards_nesting 0 Same as force_optimize_skip_unused_shards, but accept nesting level until which it will work
format_regexp Regular expression (for Regexp format)
format_regexp_escaping_rule Escaped Field escaping rule (for Regexp format)
format_regexp_skip_unmatched 0 Skip lines unmatched by regular expression (for Regexp format)
function_implementation Choose function implementation for specific target or variant (experimental). If empty, enable all of them.
input_format_avro_allow_missing_fields 0 For Avro/AvroConfluent format: when field is not found in schema use default value instead of error
input_format_with_names_use_header 0 1 For TSVWithNames and CSVWithNames input formats this controls whether the format parser is to assume that column data appear in the input exactly as they are specified in the header.
insert_in_memory_parts_timeout 600000
join_on_disk_max_files_to_merge 64 For MergeJoin on disk, set how many files are allowed to sort simultaneously. The larger the value the more memory is used and less disk I/O needed. Minimum is 2.
lock_acquire_timeout 120 How long locking request should wait before failing
log_queries_min_type QUERY_START Minimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING.
materialize_ttl_after_modify 1 Apply TTL for old data, after ALTER MODIFY TTL query
max_block_size 65536 65505 Maximum block size for reading
max_final_threads 16 The maximum number of threads to read from the table with FINAL.
max_insert_block_size 1048576 1048545 The maximum block size for insertion, if we control the creation of blocks for insertion
max_joined_block_size_rows 65536 65505 Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.
max_untracked_memory 4194304 Small allocations and deallocations are grouped in thread local variables and tracked or profiled only when the amount (in absolute value) becomes larger than specified value. If the value is higher than ‘memory_profiler_step’ it will be effectively lowered to ‘memory_profiler_step’.
memory_profiler_sample_probability 0 Collect random allocations and deallocations and write them into system.trace_log with 'MemorySample' trace_type. The probability is for every alloc/free regardless to the size of the allocation. Note that sampling happens only when the amount of untracked memory exceeds ‘max_untracked_memory’. You may want to set ‘max_untracked_memory’ to 0 for extra fine grained sampling.
metrics_perf_events_enabled 0 If enabled, some of the perf events will be measured throughout queries’ execution.
metrics_perf_events_list Comma separated list of perf metrics that will be measured throughout queries’ execution. Empty means all events.
min_chunk_bytes_for_parallel_parsing 1048576 10485760 The minimum chunk size in bytes, which each thread will parse in parallel.
min_insert_block_size_bytes 268435456 268427520 Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.
min_insert_block_size_bytes_for_materialized_views 0 Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)
min_insert_block_size_rows 1048576 1048545 Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.
min_insert_block_size_rows_for_materialized_views 0 Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows)
multiple_joins_rewriter_version 2 1 or 2. Second rewriter version knows about table columns and keeps not clashing names as is.
optimize_aggregation_in_order 0 Enable GROUP BY optimization for aggregating data in corresponding order in MergeTree tables.
optimize_aggregators_of_group_by_keys 1 Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section
optimize_arithmetic_operations_in_aggregate_functions 1 Move arithmetic operations out of aggregation functions
optimize_distributed_group_by_sharding_key 0 Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server)
optimize_duplicate_order_by_and_distinct 1 Remove duplicate ORDER BY and DISTINCT if it’s possible
optimize_group_by_function_keys 1 Eliminates functions of other keys in GROUP BY section
optimize_if_chain_to_multiif 0 Replace if(cond1, then1, if(cond2, …)) chains to multiIf. Currently it’s not beneficial for numeric types.
optimize_if_transform_strings_to_enum 0 Replaces string-type arguments in If and Transform to enum. Disabled by default cause it could make inconsistent change in distributed query that would lead to its fail
optimize_injective_functions_inside_uniq 1 Delete injective functions of one argument inside uniq*() functions
optimize_monotonous_functions_in_order_by 1 Replace monotonous function with its argument in ORDER BY
optimize_move_functions_out_of_any 1 Move functions out of aggregate functions ‘any’, ‘anyLast’
optimize_redundant_functions_in_order_by 1 Remove functions from ORDER BY if its argument is also in ORDER BY
optimize_skip_unused_shards_nesting 0 Same as optimize_skip_unused_shards, but accept nesting level until which it will work
optimize_trivial_insert_select 1 Optimize trivial ‘INSERT INTO table SELECT … FROM TABLES’ query
output_format_enable_streaming 0 Enable streaming in output formats that supports it
output_format_pretty_grid_charset UTF-8 Charset for printing grid borders. Available charsets: ASCII, UTF-8 (default)
output_format_pretty_max_value_width 10000 Maximum width of value to display in Pretty formats. If greater - it will be cut.
parallel_distributed_insert_select 0 Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard
partial_merge_join_left_table_buffer_bytes 32000000 If not 0, group left table blocks in bigger ones for the left-side table in partial merge join. It uses up to 2x of specified memory per joining thread. The current version works only with ‘partial_merge_join_optimizations = 1’.
partial_merge_join_optimizations 0 1 Enable optimizations in partial merge join
partial_merge_join_rows_in_right_blocks 10000 65536 Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.
rabbitmq_max_wait_ms 5000 The wait time for reading from RabbitMQ before retry.
read_in_order_two_level_merge_threshold 100 Minimal number of parts to read to run preliminary merge step during multithread reading in order of primary key.
send_logs_level none fatal Send server text logs with specified minimum level to client. Valid values: ’trace’, ‘debug’, ‘information’, ‘warning’, ’error’, ‘fatal’, ’none’
show_table_uuid_in_table_create_query_if_not_nil 0 For tables in databases with Engine=Atomic show UUID of the table in its CREATE query.
temporary_files_codec LZ4 Set compression codec for temporary files (sort and join on disk). I.e. LZ4, NONE.
transform_null_in 0 If enabled, NULL values will be matched with ‘IN’ operator as if they are considered equal.
validate_polygons 1 Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result.

References

ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from the Altinity Stable Repository.

Altinity Stable Release for ClickHouse 20.8.12.2

Released 2021-02-03

Description

Bug fixes.

Bug Fixes
  • Fixed a rare bug that may result in corrupted data when merging from wide to compact parts. We recommend upgrading to this release all 20.8 users who are using compact parts.

Altinity Stable Release for ClickHouse 20.8.11.17

Released 2020-12-25

Description

This release provides bug fixes and general stability improvements.

V20.3 Release

Point releases:

Altinity Stable Release for ClickHouse 20.3.12.112

Released 2020-06-24

Description

This release is a significant step forward since the previous Altinity Stable release 19.16.19.85. It includes 1203 pull requests from 171 contributors.

Major New Features

  • TTL moves. See the Putting Things Where They Belong Using New TTL Moves article. While we consider this feature to be production ready, we have a roadmap to further improve it. TTL moves have been developed by Altinity.
  • Dictionary DDLs that many ClickHouse users dreamed of! See the ClickHouse Dictionaries: Reloaded! article that highlights this feature.
  • New DateTime64 datatype with configurable precision up to nanoseconds. The feature is in beta state and more functions for usability will follow in next releases. It has also been developed by Altinity.
  • Joins have been improved a lot, including SEMI/ANTI JOIN, experimental merge join and other changes. See the join_algorithm setting below for more info on merge joins.
  • A new compact format for MergeTree tables that store all columns in one file. It improves performance of small inserts. The usual format where every column is stored separately is now called “wide.” Compact format is disabled by default. See the min_bytes_for_wide_part and min_rows_for_wide_part settings.
  • A built-in Prometheus exporter endpoint for ClickHouse monitoring statistics
  • Porting some functionality of the H3 library — A Hexagonal Hierarchical Geospatial Indexing System
  • ALTER MODIFY/DROP are currently implemented as mutations for ReplicatedMergeTree* engine family. Now ALTER commands block only at the metadata update stage, and don’t block after that.
  • Import/export gzip-compressed files directly for file based storage engines and table functions: File, URL, HDFS and S3.
  • Server improvements across different server functions. For example, performance of the Kafka Engine has been improved, parallel INSERT is now possible (see max_insert_threads setting), etc.

Upgrade Notes

  • Replication fetch protocol has been changed in this release. If only one replica is upgraded in a shard, replication may get stuck with unknown protocol version error until all replicas are upgraded. 19.16.17.80 and 19.16.19.85 contain a special compatibility fix that allows smooth upgrade without replication downtime.
  • If enable_optimize_predicate_expression is turned on, it may result in incorrect data when only part of the cluster is upgraded. Please turn it off before the upgrade, and turn it back on afterwards.
  • There were some optimizations that are incompatible between versions, so we recommend disabling enable_scalar_subquery_optimization before the upgrade. This setting turns off new scalar subquery optimizations and was backported specifically in order to facilitate smooth upgrades.
  • There were some optimizations around distributed query execution that may result in incorrect data if part of the cluster is upgraded. We recommend disabling distributed_aggregation_memory_efficient while upgrading, and turn it on afterwards.
Backward Incompatible Changes

The following changes are backward incompatible and require user attention.

  • ALTER on ReplicatedMergeTree is not compatible with previous versions.20.3 creates a different metadata structure in ZooKeeper for ALTERs. Earlier versions do not understand the format and cannot proceed with their replication queue.
  • The format of replication log entries for mutation commands has changed. You have to wait for old mutations to process before installing the new version.
  • 20.3 requires an alias for every subquery participating in a join by default. Set joined_subquery_requires_alias=0 in order to keep the previous behavior.
  • ANY JOIN logic has been changed. To upgrade without changes in behavior, you need to add SETTINGS any_join_distinct_right_table_keys = 1 to Engine Join tables metadata or recreate these tables after upgrade.
  • Functions indexHint, findClusterValue, findClusterIndex were removed
  • Settings merge_tree_uniform_read_distribution, allow_experimental_cross_to_join_conversion, allow_experimental_multiple_joins_emulation are deprecated and ignored
Other Important Notes
  • If you encounter [Argument at index 0 for function dictGetDateTime must be a constant](https://github.com/ClickHouse/ClickHouse/issues/7798) or a similar error message after upgrading, set enable_early_constant_folding=0.
  • The new release adds a parallel parsing of input formats. While it should improve the performance of data loading in text formats, sometimes it may result in a slower load. Set input_format_parallel_parsing=0 if you experience insert performance degradation.
  • The check that a query is performed fast enough is enabled by default. Queries can get an exception like this: DB::Exception: Estimated query execution time (N seconds) is too long. Maximum: 12334.. Set timeout_before_checking_execution_speed=0 to fix this problem.
  • Mutations containing non-deterministic functions, e.g. dictGet or joinGet are disabled by default. Set allow_nondeterministic_mutations=1 to enable.

Also note that this release has a new query pipeline enabled by default (codename — Processors). This was a significant internal refactoring, and we believe all issues have been fixed. However, in the rare case that you face some weird query behavior or performance degradation, try disabling experimental_use_processors and check if the problem goes away.

Changes

New formats
  • Avro
  • AvroConfluent
  • JSONCompactEachRow
  • JSONCompactEachRowWithNamesAndTypes
New or Improved Functions
  • CRC32IEEE, CRC64
  • New JSON functions.
    • isValidJSON
    • JSONExtractArrayRaw — very useful for parsing nested JSON structures
  • New array functions.
    • arrayCompact
    • arrayFill
    • arrayReverseFill
    • arrayZip
  • New geo functions.
    • geoDistance
    • greatCircleDistance
  • H3 functions.
    • h3EdgeAngle
    • h3EdgeLengthM
    • h3GetBaseCell
    • h3GetResolution
    • h3HexAreaM2
    • h3IndexesAreNeighbors
    • h3IsValid
    • h3ToChildren
    • h3ToParent
    • h3ToString
    • h3kRing
    • stringToH3
  • Functions for DateTime64.
    • now64
    • parseDateTime64BestEffort, parseDateTime64BestEffortOrNull, parseDateTime64BestEffortOrZero
    • toDateTime64, toDateTime64OrNull, toDateTime64OrZero
  • New aggregate functions:
    • avgWeighted
    • categoricalInformationValue
    • groupArraySample
  • Other.
    • bitCount
    • ifNotFinite.
    • isConstant
    • javaHashUTF16LE
    • moduloOrZero
    • randomPrintableASCII
    • roundBankers

New table functions

  • clusterAllReplicas — similar to cluster function but queries all replicas in the cluster
  • generateRandom — allows users to generate random dataset with defined structure
  • zeros, zeros_mt

New metrics and events

system.metrics:

  • BackgroundMovePoolTask
  • CacheDictionaryUpdateQueueBatches
  • CacheDictionaryUpdateQueueKeys
  • MemoryTrackingInBackgroundMoveProcessingPool
  • MySQLConnection
  • SendScalars

New system tables

  • metric_log — stores system.metrics and system.events history
  • quotas, quota_usage — introspection to quotes
  • row_policies — introspection to row security policies
  • trace_log, stack_trace — for low-level debugging purpose
  • zeros, zeros_mt — zero generators, like system.numbers, but faster and returns zeros

New columns in system tables

  • mutations: parts_to_do_names
  • parts, parts_columns: part_type
  • processes: thread_ids replaces old columns thread_numbers and os_thread_ids
  • query_log: exception_code, thread_ids replaces old columns thread_numbers and os_thread_ids
  • replicas: zookeeper_exception
  • settings: min, max, readonly
  • table_engines: supports_settings, supports_skipping_indices, supports_sort_order, supports_ttl, supports_replication, supports_deduplication

system.merge_tree_settings added/changed

Settings Old value New value Description
disable_background_merges   0 Disable background merges
merge_max_block_size   8192 How many rows in blocks should be formed for merge operations
min_bytes_for_wide_part   0 Minimal uncompressed size in bytes to create part in wide format instead of compact
min_rows_for_wide_part   0 Minimal number of rows to create part in wide format instead of compact
use_minimalistic_part_header_in_zookeeper 0 1 Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes. This setting was available for a year already, and many users have it enabled.

system.settings added/changed

Setting Old value New value Description
allow_experimental_alter_materialized_view_structure   0 Allow atomic alter on Materialized views. Work in progress
allow_experimental_data_skipping_indices 0 1 Data skipping indices are now enabled by default
allow_nondeterministic_mutations   0 Allow non-deterministic functions in ALTER UPDATE/ALTER DELETE statements, e.g. with dictGet() functions
background_move_pool_size   8 Number of threads performing background moves for tables. Only has meaning at server startup
default_max_bytes_in_join 100000000 1000000000 Maximum size of right-side table if limit is required but max_bytes_in_join is not set
enable_early_constant_folding 0 1 Enable query optimization where we analyze function and subqueries results and rewrite query if there are constants there
enable_scalar_subquery_optimization 0 1 If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once
experimental_use_processors 0 1 Processors pipeline is now enabled by default
force_optimize_skip_unused_shards   1 Throw an exception if unused shards cannot be skipped (1 – throw only if the table has the sharding key, 2 – always throw)
format_avro_schema_registry_url     For AvroConfluent format: Confluent Schema Registry URL
input_format_parallel_parsing   1 Enable parallel parsing for some data formats
input_format_values_deduce_templates_of_expressions 0 1 For Values format: if the field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows
join_algorithm   hash Specify join algorithm: auto, hash, partial_merge, prefer_partial_merge. auto tries to change HashJoin to MergeJoin on the fly to avoid out of memory
joined_subquery_requires_alias 0 1 Force joined subqueries and table functions to have aliases for correct name qualification
max_insert_threads   0 The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see max_threads setting
max_joined_block_size_rows   65536 Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited
max_parser_depth   1000 Maximum parser depth
memory_profiler_step   0 Every number of bytes the memory profiler will collect the allocating stack trace. The minimal effective step is 4 MiB (less values will work as clamped to 4 MiB). Zero means disabled memory profiler
min_bytes_to_use_mmap_io   0 The minimum number of bytes for reading the data with the mmap option during SELECT queries execution. 0 – disabled
min_chunk_bytes_for_parallel_parsing   1048576 The minimum chunk size in bytes, which each thread will parse in parallel
mutations_sync   0 Wait for synchronous execution of ALTER TABLE UPDATE/DELETE queries (mutations). 0 – execute asynchronously. 1 – wait current server. 2 – wait all replicas if they exist
optimize_if_chain_to_miltiif   0 Replace if(cond1, then1, if(cond2, …)) chains to multiIf
optimize_trivial_count_query   1 Process trivial SELECT count() FROM table query from metadata
output_format_avro_codec     Compression codec used for output. Possible values: null, deflate, snappy
output_format_avro_sync_interval   16384 Sync interval in bytes
output_format_csv_crlf_end_of_line     If it is set true, end of line in CSV format will be \r\n instead of \n
output_format_tsv_crlf_end_of_line     If it is set true, end of line in TSV format will be \r\n instead of \n
timeout_before_checking_execution_speed 0 10 Check that the speed is not too low after the specified time has elapsed
use_compact_format_in_distributed_parts_names   0 Changes format of directories names for distributed table insert parts

References

RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used as well but check version numbers carefully.

Altinity Stable Release for ClickHouse 20.3.19.4

Released 2020-09-23

Description

The release includes several dozen bug fixes from the previous release.

Major New Features
  • Server startup time has been reduced by parsing metadata in parallel.
  • Improved performance of primary key analysis for LowCardinality columns. That was a performance regression introduced in 20.3.
  • Improved performance of queries with large tuples and tuples used in primary key.
Bug Fixes
  • Fixed a bug that prevented attaching Materialized Views to system tables.
  • Fixed incorrect behavior of if function with NULLs.
  • Fixed segfaults in rare cases.
  • Fixed a bug that prevented predicate pushdown for queries using WITH clause.
  • Fixed SIGSEGV in Kafka engine when broker is unavailable.
  • Fixed a bug leading to block structure mismatch error for queries with UNION and JOIN.
  • Fixed TTL processing logic to process all partitions in one run.
  • Fixed a bug of parsing of row policies from configuration files that could result in missing policies sometimes.
  • Fixed a bug with ALTER TABLE UPDATE could produce incorrect results when updating Nullable columns.
  • Fixed a bug with codecs not working properly for MergeTree compact parts.

References

Updated RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used for DEB packages but check version numbers carefully.

V19.16 Release

Point releases:

Altinity Stable Release for ClickHouse 19.16.10.44

Released 2020-01-20

Description

This release has major updates:: 2744 commits in 528 pull-requests from 113 contributors.

Major New Features

  • Tiered storage: multiple storage volumes for MergeTree tables. It’s possible to store fresh data on SSD and automatically move old data to HDD. We already discussed this functionality in our blog.

  • LIVE VIEW tables that we also described in our blog. The support in this release is not complete, but new improvements are coming quickly. Feel free to try it out.

  • WITH FILL modifier of ORDER BY for data gap filling. It allows to fill missing data to provide uniform reports. For example, you can fill missing dates for time series so that every day is shown even if some days have no data. This feature is not documented yet, so here is an example how missing dates can be filled:

    SELECT arrayJoin([today()-10, today()]) AS d ORDER BY d ASC WITH FILL

  • Table constraints that are checked at insert.

  • Sensitive data masking for query_log, server logs, process list with regexp-based rules.

  • Table function input() for reading incoming data in INSERT SELECT query. A very useful feature when you need to preprocess data just before inserting. We are going to showcase it in our planned articles around log processing use cases.

  • Cascaded materialized views. It is an important feature for many use cases including Kafka integration. In particular, it allows you to have load raw data from Kafka using MV and then aggregate it using another MV. That was not possible in previous ClickHouse releases.

  • Kafka: Altinity took over support for the Kafka engine a few months ago. Kafka functionality and stability has been improved in this release, in particular:

    • ClickHouse can act as Kafka producer, and not just to read from Kafka, but also send data back with an insert statement.
    • Atomic parsing of each message: kafka_row_delimiter is now obsolete (ignored)
    • More reliable commit logic
    • Virtual columns _partition and _timestamp for Kafka engine table.
    • Parsing of most of the formats is working properly now
Known issues
  • In case of topic rebalancing on the Kafka side, duplicates are possible. We are going to address it in 2020/Q1. https://github.com/ClickHouse/ClickHouse/issues/7259.
  • Select from mysql() table function is broken in this release. Two workarounds are possible:
    • Use MySQL table engine.
    • create table as mysql(...) and then select from this table — this is actually a shortcut to MySQL table engine.
  • Some constant-like functions can misbehave when queried through Distributed tables: https://github.com/ClickHouse/ClickHouse/issues/7666.
  • ‘system reload dictionary’ does not reload dictionaries with ‘update_field’: https://github.com/ClickHouse/ClickHouse/issues/7440.
  • complicated where conditions involving fields of UInt8 type with values > 1 can return unexpected results. (workaround: instead of a and b and c use a <> 0 and b <> 0 and c <> 0) https://github.com/ClickHouse/ClickHouse/issues/7772.
  • If you want to preserve old ANY JOIN behavior while upgrading from a version before 19.13, you may need to install 19.13.7 first, change any_join_distinct_right_table_keys setting there and after that you can upgrade to 19.16. But we recommend to review your queries and rewrite them without this join type. In future releases it will be available, but with a different name (SEMI JOIN).

Upgrade Notes

Backward Incompatible Changes
  • count() supports only a single argument.
  • Legacy asterisk_left_columns_only setting has been removed (it was disabled by default).
  • Numeric values for Enums can now be used directly in IN section of the query.
  • Changed serialization format of bitmap* aggregate function states to improve performance. Serialized states of bitmap* from previous versions cannot be read. If you happen to use bitmap aggregate functions, please contact us before upgrading.
  • system.query_log column type was changed from UInt8 to Enum8.
  • ANY RIGHT/FULL/INNER JOIN is disabled by default. Set any_join_distinct_right_table_keys setting to enable them.

Changes

New Formats
  • ORC format.
  • Template/TemplateIgnoreSpaces format. It allows to parse / generate data in custom text formats. So you can for example generate HTML directly from ClickHouse thus turning ClickHouse to the web server.
  • CustomSeparated/CustomSeparatedIgnoreSpaces format. Supports custom escaping and delimiter rules.
  • JSONEachRowWithProgress
  • Parse unquoted NULL literal as NULL (enabled by setting format_csv_unquoted_null_literal_as_null).
  • Initialize null fields with default values if the data type of this field is not nullable (enabled by setting input_format_null_as_default).
New or improved functions
  • Aggregate function combinators which fill null or default value when there is nothing to aggregate: -OrDefault, -OrNull

  • Introduce uniqCombined64() to get sane results for cardinality > UINT_MAX

  • QuantileExactExclusive and Inclusive aggregate functions

  • hasToken/hasTokenCaseInsensitive (look for the token in string the same way as token_bf secondary index)

  • multiFuzzyMatchAllIndices, multiMatchAllIndices (return the Array of all matched indices in multiMatch family functions)

  • repeat function for strings

  • sigmoid and tanh functions for ML applications

  • Roaring Bitmaps:

    • Changes CRoaring serialization functions (you will not be able to read Bitmaps created by earlier versions)
    • bitmapSubsetInRange,
    • bitmapMin, bitmapMax,
    • bitmapSubsetLimit(bitmap, range_start, limit),
    • groupBitmapAnd, groupBitmapOr, groupBitmapXor
  • geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision) which creates an array of precision-long strings of geohash boxes covering the provided area.

  • Support for wildcards in paths of table functions file and hdfs. If the path contains wildcards, the table will be readonly:

    SELECT * FROM hdfs('hdfs://hdfs1:9000/some_dir/another_dir/*/file', 'Parquet', 'col1 String')

  • New function neighbour(value, offset[, default_value]). Allows to reach prev/next row within the column.

  • Optimize queries with ORDER BY expressions clause, where expressions have coinciding prefix with sorting key in MergeTree tables. This optimization is controlled by optimize_read_in_order ‘setting

  • New function arraySplit and arrayReverseSplit which can split an array by “cut off” conditions. They are useful in time sequence handling.

  • Table function values (the name is case-insensitive). It allows to create table with some data inline.

    SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))

  • fullHostName (alias FQDN)

  • numbers_mt() — multithreaded version of numbers().

  • currentUser() (and alias user()), returning login of authorized user.

  • S3 engine and table function. Partial support in this release (no authentication), complete version is expected in 19.18.x and later

  • WITH TIES modifier for LIMIT

New dictionaries features
  • Redis as an external dictionary source.
  • <sparse_hashed> dictionary layout, that is functionally equivalent to the <hashed> layout, but is more memory efficient. It uses about twice as less memory at the cost of slower value retrieval.
  • allow_dictionaries user setting that works similar to allow_databases.
  • HTTP source new attributes: credentials and http-headers.
Operations / Monitoring
  • system.metric_log table which stores values of system.events and system.metrics with specified time interval.

  • system.text_log in order to store ClickHouse logs to itself

  • Support for detached parts removal:

    ALTER TABLE &lt;table_name&gt; DROP DETACHED PART '&lt;part_id&gt;'

  • MergeTree now has an additional option ttl_only_drop_parts (disabled by default) to avoid partial pruning of parts, so that they dropped completely when all the rows in a part are expired.

  • Added miscellaneous function getMacro(name) that returns String with the value of corresponding <macros> from configuration file on current server where the function is executed.

New metrics & events
  • system.metrics: DistributedFilesToInsert, GlobalThread, GlobalThreadActive, LocalThread, LocalThreadActive
  • system.events: Merge
New system tables
  • system.disks
  • system.storage_policies
New columns in system tables
  • system.clusters: errors_count, estimated_recovery_time
  • system.collations: language
  • system.parts: disk_name
  • system.parts_columns: disk_name
  • system.processes: os_thread_ids
  • system.query_log: os_thread_ids
  • system.tables: storage_policy
system.merge_tree_settings changed/added
Name Default Description
concurrent_part_removal_threshold 100 Activate concurrent part removal (see ‘max_part_removal_threads’) only if the number of inactive data parts is at least this.
max_part_loading_threads auto(6) The number of threads to load data parts at startup.
max_part_removal_threads auto(6) The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).
storage_policy default Name of storage disk policy
ttl_only_drop_parts 0 Only drop altogether the expired parts and not partially prune them.
system.settings changed/added
Name Default Description
allow_drop_detached 0 Allow ALTER TABLE … DROP DETACHED PARTITION … queries
allow_experimental_live_view 0 Enable LIVE VIEW. Not mature enough.
allow_introspection_functions 0 Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.
any_join_distinct_right_table_keys 0 (was 1) Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for ’t1 ANY LEFT JOIN t2’ and ’t2 ANY RIGHT JOIN t1’. ANY RIGHT JOIN needs one-to-many keys mapping to be consistent with LEFT one.
asterisk_left_columns_only removed
connection_pool_max_wait_ms 0 The wait time when connection pool is full.
default_max_bytes_in_join 100000000 Maximum size of right-side table if limit’s required but max_bytes_in_join is not set.
distributed_directory_monitor_max_sleep_time_ms 30000 Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.
distributed_replica_error_cap 1000 Max number of errors per replica, prevents piling up incredible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.
distributed_replica_error_half_life 60 Time period reduces replica error counter by 2 times.
format_custom_escaping_rule Escaped Field escaping rule (for CustomSeparated format)
format_custom_field_delimiter Delimiter between fields (for CustomSeparated format)
format_custom_result_after_delimiter Suffix after result set (for CustomSeparated format)
format_custom_result_before_delimiter Prefix before result set (for CustomSeparated format)
format_custom_row_after_delimiter Delimiter after field of the last column (for CustomSeparated format)
format_template_resultset Path to file which contains format string for result set (for Template format)
format_template_row Path to file which contains format string for rows (for Template format)
format_template_rows_between_delimiter Delimiter between rows (for Template format)
input_format_csv_unquoted_null_literal_as_null 0 Consider unquoted NULL literal as N
input_format_null_as_default 0 For text input formats initialize null fields with default values if data type of this field is not nullable
input_format_tsv_empty_as_default 0 Treat empty fields in TSV input as default values.
input_format_values_accurate_types_of_literals 1 For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.
input_format_values_deduce_templates_of_expressions 0 For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.
joined_subquery_requires_alias 0 Force joined subqueries to have aliases for correct name qualification.
kafka_max_wait_ms 5000 The wait time for reading from Kafka before retry.
live_view_heartbeat_interval 15 The heartbeat interval in seconds to indicate live query is alive.
max_http_get_redirects 0 Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services.
max_live_view_insert_blocks_before_refresh 64 Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed.
min_free_disk_space_for_temporary_data 0 The minimum disk space to keep while writing temporary data used in external sorting and aggregation.
optimize_read_in_order 1 Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.
partial_merge_join 0 Use partial merge join instead of hash join for LEFT and INNER JOINs.
partial_merge_join_optimizations 0 Enable optimizations in partial merge join
partial_merge_join_rows_in_left_blocks 10000 Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.
partial_merge_join_rows_in_right_blocks 10000 Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.
query_profiler_cpu_time_period_ns 1000000000 (was 0) Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.
query_profiler_real_time_period_ns 1000000000 (was 0) Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.
replace_running_query_max_wait_ms 5000 The wait time for running query with the same query_id to finish when setting ‘replace_running_query’ is active.
s3_min_upload_part_size 536870912 The minimum size of part to upload during multipart upload to S3.
temporary_live_view_timeout 5 Timeout after which temporary live view is deleted.

Altinity Stable Release for ClickHouse 19.16.19.85

Released 2020-04-20

Description

This is a combined release of 19.16.19.65, upgraded to 19.16.19.85 after review.

Major New Features

enable_scalar_subquery_optimization setting is disabled by default. It is required in order to perform rolling upgrades to 20.x versions.

Bug Fixes
  • Fixed a bug with non-Date/DateTime columns not being allowed in TTL expressions
  • Fixed a bug in the Kafka engine that sometimes prevented committing a message back to the broker if the broker was temporarily unavailable
  • Fixed a bug in which the boolean functions ‘or’ and ‘and’ might return incorrect results when called on more than 10 arguments if some of them are NULL
  • Fixed server side certificates in docker image
  • Fixed a possible race condition between insert quorum and drop partition
  • Volume selection in merge process could move data back to the first volume in storage policy in some cases
  • Mutations that did not work if insert_quorum has been used for the table
  • Deduplication logic that ignored dependent materialized views if a duplicate for the main table has been detected. The fix is turned off by default and can be enabled with a new deduplicate_blocks_in_dependent_materialized_views setting.
  • Kafka engine could result in data losses when ZooKeeper is temporarily not available
  • Kafka engine did not allow users to drop a table with incorrect Kafka topic
  • Kafka engine did not allow users to use subqueries in attached materialized views
  • ALTER TABLE MODIFY CODEC default expression and a codec are specified

Altinity Stable Release for ClickHouse 19.16.12.49

Released 2020-02-07

Description

Improved Kafka engine reliability, performance and usability.

Bug Fixes
  • When there is a rebalance of partitions between consumers, occasional data duplicates and losses were possible.
  • When data was polled from several partitions with one poll and committed partially, occasional data losses were possible.
  • Block size threshold (‘kafka_max_block_size’ setting) now triggers block flush correctly that reduces the chance of data loss under high load.
  • When new columns are added to the Kafka pipeline (Kafka engine -> Materialized View -> Table) it was previously not possible to alter the destination tables first with default value and then alter MV in order to minimize downtime. The problem was not related directly to Kafka, but general implementation of materialized views. It is fixed now.
  • Few other minor problems have been addressed as well.

Since ClickHouse now respects the ‘kafka_max_block_size’ setting that defaults to 65535, we recommend increasing it to the bigger values for high volume streaming. Setting it to 524288 or 1048576 may increase consumer throughput up to 20%.

Known Issues

The ‘kafka_max_block_size’ defaults to 65535. We recommend increasing it to the bigger values for high volume streaming. Setting it to 524288 or 1048576 may increase consumer throughput up to 20%.

V19.13 Release

Altinity Stable Release for ClickHouse 19.13.7.57

Released 2019-11-28

Description

This is a minor upgrade over the previous release.

Major New Features
  • CREATE TABLE AS TABLE FUNCTION statement that allows to create tables from external data automatically, e.g. from mysql table.
  • COLUMN('regexp') macro function as a generalization of SELECT * queries.
Bug Fixes
  • Fixed security vulnerability in url() function.
  • Fixed security vulnerability related to Zookeeper.
  • Fixed vulnerabilities in compression codecs.
  • Fixed corner case overflows in DoubleDelta codec.
Known Issues
  • ANY INNER|RIGHT|FULL JOIN is deprecated! While we like how it worked before, there are plans to change the behavior completely. We are still discussing with the development team how we can preserve the old behavior. For now, in order to preserve the old behavior and safely upgrade to newer versions in the future one needs to enable any_join_distinct_right_table_keys setting!
  • The setting input_format_defaults_for_omitted_fields is on by default. It enables calculation of complex default expressions for omitted fields in JSONEachRow and CSV* formats. Inserts to Distributed tables need this setting to be the same across the cluster.
    • You need to set input_format_defaults_for_omitted_fields across the cluster before the upgrade if rolling upgrade is performed.

References

V19.11 Release

Altinity Stable Release for ClickHouse 19.11.8

Released 2019-09-03

Description

Updates including improved Kafka support.

Major New Features
  • Multi-table JOINs with standard SQL syntax (some limitations are left: only equi-JOIN, limited condition pushdown support)
  • Secondary indices - allow skipping data blocks without reading data: min/max filter, n-grams or token-based bloom filter
  • TTL expressions for columns and tables to remove old data automatically
  • Unique ASOF JOIN - join to the most recent value known
  • Per-column custom compression codecs (Delta, T64, DoubleDelta, Gorilla): fine-tune compression for particular columns
  • Roaring Bitmaps - use bitmaps for different on-set operations (calculate cardinality, unions, intersects, etc)
  • Machine-learning functions: evalMLMethod, simpleLinearRegression, stochasticLinearRegression, stochasticLogisticRegression
  • Predefined per-row filter expressions for tables, which enable simple row-level security rules
  • Adaptive index granularity (setting index_granularity_bytes) - useful for tables having rows with relatively big size (>1280 bytes)
  • Advanced text processing functions
    • Multiple substring search: multiSearch(Any|FirstPosition|FirstIndex|AllPositions)(CaseInsensitive)?(UTF8)?
    • Multiple regex search (hyperscan-powered): multiMatch(Any|AnyIndex)
    • Fuzzy regex match (hyperscan-powered): multiFuzzyMatch(Any|AnyIndex)
    • N-gram distance for fuzzy string comparison and search (similar to q-gram metrics in R language): ngramDistance(CaseInsensitive)?(UTF8)?, ngramSearch(CaseInsensitive)?(UTF8)?
  • HDFS read/write access
  • New JSON processing functions - high performance & compliant with JSON standard
  • IPv4 and IPv6 data types
  • New formats
    • Protobuf — now fully supported with input and output plus nested data structures
    • Parquet
    • RowBinaryWithNamesAndTypes
    • JSONEachRow and TSKV - now support default expressions for missing fields (Check input_format_defaults_for_omitted_fields)
    • TSVWithNames/CSVWithNames - column order can now be determined from file header (Check input_format_with_names_use_header parameter).
  • SQL statements with bind parameters
  • Improved MySQL integration:
    • new database engine to access all the tables in remote MySQL server
    • support for MySQL wire protocol, allowing to connect to ClickHouse using MySQL clients.
Known issues
  • All released 19.x versions have had some problems with Kafka engine implementation due to a full re-write of Kafka support. In 19.11.8 Kafka is working much better than previous 19.x releases. However, there are still some corner cases that can lead to data duplication in certain scenarios, for example, in the event of ClickHouse server restart. Those issues will be addressed soon.

  • Adaptive granularity is enabled by default, and index_granularity_bytes is set to 10Mb. This feature uses a different data format, and interoperability between old and new format has some issues. So if you’re upgrading your cluster from an older version, consider disabling it before the upgrade by putting the following fragment in your config.xml:

    <merge_tree>
      <index_granularity_bytes>0</index_granularity_bytes>
    </merge_tree></code></pre>
    

    After upgrade, you can choose any convenient time to turn adaptive granularity on. In general, it’s a cool feature and especially useful when you have rows of size > 1Kb in your tables. If you are doing a new installation, please leave the default setting value as is. The adaptive granularity feature is very nice and useful.

  • enable_optimize_predicate_expression is now enabled by default. It`s possible you may have some issues when a condition passed to subselect leads to some suboptimal / undesired effects. If this happens please report the issue and disable the feature for that select (or globally).

  • Secondary indices are maturing but still considered as experimental. There is at least one severe bug: when a mutation is executed with a condition on the column with secondary index - it can affect more rows than expected. Please be careful with DELETE, or upgrade to 19.13 (see issue #6224).

  • Some users have reported problems with ODBC data sources after upgrade. In most cases these were misconfigurations. Nevertheless, please do canary/staging updates and check how your ODBC connections work before moving to production.

Upgrade Notes

Backward compatibility issues
  • Due to update of LZ4 library the new ClickHouse version writes parts which are not binary equivalent to those written with older versions. That makes it problematic to update only one replica. Leaving the cluster in such a state for a long period of time time will work but may lead to excessive parts copying between nodes due to checksum mismatches.
  • There is a new setting max_partitions_per_insert_block with default value 100. If the inserted block contains a larger number of partitions, an exception is thrown. Set it to 0 if you want to remove the limit (not recommended).
  • If you are using unexpected low cardinality combinations like LowCardinality(UInt8), the new version will prevent you from doing so. if you really know what you are doing check allow_suspicious_low_cardinality_types and set it to 1.
  • This release adds max_parts_in_total setting for MergeTree family of tables (default: 100 000). We hope your number of partitions is much lower than this limit. If necessary you can raise the value.
  • The system.dictionaries table has been changed. If you used it for monitoring purposes, you may need to change your scripts.
  • Dictionaries are loaded lazily by default. It means they have status NOT_LOADED until the first access.

Changes

New functions
  • New hash functions: xxHash32, xxHash64, gccMurmurHash, hiveHash, javaHash, CRC32
  • JSON processing functions: JSONExtract,JSONExtract(Bool|Float|Int|KeysAndValues|Raw|String|UInt),JSONHas,JSONKey,JSONLength,JSONType
  • Geospatial processing: geoToH3, geohashEncode, geohashDecode
  • IP address handling: IPv4CIDRtoIPv4Range, IPv6CIDRtoIPv6Range, toIPv4, toIPv6
  • New statistical aggregate functions: skewPop, skewSamp, kurtPop, kurtSamp
  • String functions: isValidUTF8, regexpQuoteMeta, trimBoth, trimLeft, trimRight, format, toValidUTF8
  • Encoding: tryBase64Decode, base64Decode, base64Encode
  • Array processing: arrayEnumerateDenseRanked, arrayEnumerateUniqRanked, flatten, arrayFlatten, arrayWithConstant,
  • Date/Time processing: toStartOfInterval, addQuarters, subtractQuarters, toIntervalQuarter, toStartOfTenMinutes
  • Numerical: roundDown, toDecimalOrZero, toDecimalOrNull
  • Dictionary: dictGet, dictGetOrDefault
  • Roaring Bitmaps: bitmapAndCardinality, bitmapAndnot, bitmapAndnotCardinality, bitmapBuild, bitmapCardinality, bitmapContains, bitmapHasAll, bitmapHasAny, bitmapOr, bitmapOrCardinality, bitmapToArray, bitmapXor, bitmapXorCardinality, groupBitmap
  • OS level introspection: filesystemAvailable, filesystemCapacity, filesystemFree, basename
  • New aggregate functions: boundingRatio, entropy, groupBitmap, sumMap(Filtered|FilteredWithOverflow|WithOverflow), topKWeighted, groupArrayMovingAvg,groupArrayMovingSum, timeSeriesGroupRateSum, timeSeriesGroupSum
  • SQL standard compatibility aliases added: left, right, trim, ltrim, rtrim, timestampadd, timestampsub, power, replace, ln, locate, mid
Other notable changes
  • Setting constraints which limit the possible range of setting value per user profile.
  • KILL MUTATION added
  • New aggregate function combinators: -Resample
  • Added new data type SimpleAggregateFunction - light aggregation for simple functions like any, anyLast, sum, min, max
  • Ability to use different sorting key (ORDER BY) and index (PRIMARY KEY). The sorting key can be longer than the index. You can alter ORDER BY at the moment of adding / removing the column
  • HTTP interface: brotli compression support, X-ClickHouse-Query-Id and X-ClickHouse-Summary headers in response, ability to cancel query on disconnect (check cancel_http_readonly_queries_on_client_close)
  • DFA-based implementation for functions sequenceMatch and sequenceCount in case the pattern doesn’t contain time
  • Back up all partitions at once with ALTER TABLE … FREEZE
  • Comments for a column in the table description
  • Join engine: new options join_use_nulls, max_rows_in_join, max_bytes_in_join, join_any_take_last_row and join_overflow_mode + joinGet function that allows you to use a Join type table like a dictionary.
  • /docker-entrypoint-initdb.d for database initialization in docker
  • Graphite rollup rules reworked.
  • Query settings in asynchronous INSERTs into Distributed tables are respected now
  • Hints when while user make a typo in function name or type in command line client
  • system.detached_parts table containing information about detached parts of MergeTree tables
  • Table function remoteSecure
  • Ability to write zookeeper part data in more compact form (use_minimalistic_part_header_in_zookeeper)
  • Ability to close MySQL connections after their usage in external dictionaries
  • Support RENAME operation for Materialized View
  • Non-blocking loading of external dictionaries

Kafka now supports SASL SCRAM authentication, new virtual columns _topic, _offset, _key are available, and a lot of other improvements.

Settings changed/added
Name Default Description
allow_experimental_cross_to_join_conversion 1 Convert CROSS JOIN to INNER JOIN if possible
allow_experimental_data_skipping_indices 0 If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.
allow_experimental_low_cardinality_type 1 (was 0) Obsolete setting, does nothing. Will be removed after 2019-08-13
allow_experimental_multiple_joins_emulation 1 Emulate multiple joins using subselects
allow_hyperscan 1 Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.
allow_simdjson 1 Allow using simdjson library in ‘JSON*’ functions if AVX2 instructions are available. If disabled rapidjson will be used.
allow_suspicious_low_cardinality_types 0 In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption.
cancel_http_readonly_queries_on_client_close 0 Cancel HTTP readonly queries when a client closes the connection without waiting for response.
check_query_single_value_result 1 Return check query result as single 1/0 value
enable_conditional_computation (was 0) DELETED
enable_optimize_predicate_expression 1 (was 0) If it is set to true, optimize predicates to subqueries.
enable_unaligned_array_join 0 Allow ARRAY JOIN with multiple arrays that have different sizes. When this setting is enabled, arrays will be resized to the longest one.
external_table_functions_use_nulls 1 If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only for ‘mysql’ table function.
idle_connection_timeout 3600 Close idle TCP connections after specified number of seconds.
input_format_defaults_for_omitted_fields 0 For input data calculate default expressions for omitted fields (it works for JSONEachRow format).
input_format_with_names_use_header 0 For TSVWithNames and CSVWithNames input formats this controls whether format parser is to assume that column data appear in the input exactly as they are specified in the header.
join_any_take_last_row 0 When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key. Allows you to overwrite old values in table with Engine=Join.
join_default_strictness ALL Set default strictness in JOIN query. Possible values: empty string, ‘ANY’, ‘ALL’. If empty, query without strictness will throw exception.
low_cardinality_allow_in_native_format 1 Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.
max_alter_threads auto The maximum number of threads to execute the ALTER requests. By default, it is determined automatically.
max_execution_speed 0 Maximum number of execution rows per second.
max_execution_speed_bytes 0 Maximum number of execution bytes per second.
max_partitions_per_insert_block 100 Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.
max_streams_multiplier_for_merge_tables 5 Request more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and is especially helpful when merged tables differ in size.
max_threads auto The maximum number of threads to execute the request. By default, it is determined automatically.
merge_tree_max_bytes_to_use_cache 2013265920 The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)
merge_tree_min_bytes_for_concurrent_read 251658240 If at least as many bytes are read from one file, the reading can be parallelized.
merge_tree_min_bytes_for_seek 0 You can skip reading more than that number of bytes at the price of one seek per file.
min_count_to_compile_expression 3 The number of identical expressions before they are JIT-compiled
min_execution_speed_bytes 0 Minimum number of execution bytes per second.
network_compression_method LZ4 Allows you to select the method of data compression when writing.
optimize_skip_unused_shards 0 Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.
os_thread_priority 0 If non zero - set corresponding ‘nice’ value for query processing threads. Can be used to adjust query priority for OS scheduler.
output_format_parquet_row_group_size 1000000 Row group size in rows.
stream_poll_timeout_ms 500 Timeout for polling data from streaming storages.
tcp_keep_alive_timeout 0 The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes
MergeTree settings
Name Default Description
enable_mixed_granularity_parts 0 Enable parts with adaptive and non adaptive granularity
index_granularity_bytes 10485760 Approximate amount of bytes in single granule (0 - disabled).
write_final_mark 1 Write final mark after end of column (0 - disabled, do nothing if index_granularity_bytes=0)
max_parts_in_total 100000 If more than this number active parts in all partitions in total, throw ‘Too many parts …’ exception.
merge_with_ttl_timeout 86400 Minimal time in seconds, when merge with TTL can be repeated.
min_merge_bytes_to_use_direct_io 10737418240 (was 0) Minimal amount of bytes to enable O_DIRECT in merge (0 - disabled).
replicated_max_parallel_fetches_for_host 15 Limit parallel fetches from endpoint (actually pool size).
use_minimalistic_part_header_in_zookeeper 0 Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes (/columns and /checksums). This can dramatically reduce snapshot size in ZooKeeper. Before enabling check that all replicas support new format.

V18.14 Release

Altinity Stable Release for ClickHouse 18.14.19

Released 2018-12-31

Description

Altinity Stable Release 18.14.19 is a minor bug-fixing release to the previous 18.14.15. ClickHouse memory consumption in general is decreased with this release.

Bug fixes

  • Fixed a bug with range_hashed dictionaries returning wrong results.
  • Fixed an error that caused messages “netlink: ‘…’: attribute type 1 has an invalid length” to be printed in Linux kernel log in some recent versions of Linux kernel.
  • Fixed segfault in function ’empty’ for FixedString arguments.
  • Fixed excessive memory allocation when using a large value of max_query_size.
  • Fixed cases when the ODBC bridge process did not terminate with the main server process.
  • Fixed synchronous insertion into the Distributed table with a columns list that differs from the column list of the remote table.
  • Fixed a rare race condition that could lead to a crash when dropping a MergeTree table.
  • Fixed a query deadlock in a case when query thread creation fails with the ‘Resource temporarily unavailable’ error.
  • Fixed parsing of the ENGINE clause when the CREATE AS table syntax was used and the ENGINE clause was specified before the AS table (the error resulted in ignoring the specified engine).
  • Fixed a segfault if the ‘max_temporary_non_const_columns’ limit was exceeded
  • Fixed a bug with databases being not correctly specified when executing DDL ON CLUSTER queries

References