Altinity Stable for ClickHouse 23.3.8 Release Notes
It has been a while since we have certified Altinity.Stable ClickHouse 22.8 release. It was delivered together with the Altinity Stable build for ClickHouse. Since then many things have happened to ClickHouse. More than 300 ClickHouse contributors from companies all around the world submitted 3000 pull requests with new features and improvements. Unfortunately, some changes affected the stability of new releases. That is why it took us 4 long months to get enough confidence in order to recommend 23.3 for production use and make sure upgrades go smoothly. As of 23.3.8 we are confident in certifying 23.3 as an Altinity Stable release.
There are a lot of new features, but also many things that changed their behavior, so check release notes carefully before upgrading.
Major new features in 23.3 since the previous stable release 22.8
A new release introduces a lot of changes and new functions. It is very hard to pick the most essential ones, so refer to the full list in the Appendix. The following major features are worth mentioning on the front page:
- SQL features:
grace_hash
JOIN algorithm- Manipulating named collections via DDL commands:
SHOW/CREATE/ALTER/DROP named collection
- Parameterized views:
CREATE VIEW test AS SELECT ... WHERE user_id = {user:UInt64}
and thenSELECT * FROM test(user = 123)
- Lightweight DELETEs are not marked as experimental anymore. There are still performance limitations, so do not consider them as a drop-in replacement for SQL DELETE statements
- Regexp operator:
SELECT ... WHERE col1 REGEXP 'a.*b'
- Query cache
- MergeTree improvements:
- ReplacingMergeTree with
deleted
flag final
setting to implicitly apply theFINAL
modifier to every table. See the blog article describing these new features of ReplacingMergeTree. a)- Control of scheduling merges using
min_age_to_force_merge_seconds
andmin_age_to_force_merge_on_partition_only
settings - Compressed marks and primary key on disk (disabled by default in 23.3)
- Compressed marks in memory
- ReplacingMergeTree with
- Replication improvements:
insert_quorum = 'auto'
to use the majority number- Automatic retries of inserts into replicated tables if (Zoo)Keeper is temporarily not available. Number of retries is controlled by
insert_keeper_max_retries
setting - Support for replication of user-defined SQL functions via (Zoo)Keeper
- General performance and stability improvements when using Keeper
- Security & access control
- Certificate-based user authentication on the native protocol
- Password complexity rules and checks
- Automatically mask sensitive information like credentials in logs (without query masking rules)
- Exposed applied row-level policies to system.query_log a)
- Storage management
- Allow to assign disks on the table level instead of storage policy:
SETTINGS disk = '<disk_name>'
(instead of storage_policy) - Explicit disk creation
SETTINGS disk = disk(type=s3, ...)
- Allow nested custom disks
- Added S3 as a new type of destination for embedded backups. Supports BACKUP to S3 with as-is path/data structure.
- Server-side copies for embedded S3 backups
- Allow to assign disks on the table level instead of storage policy:
- Object storage
**
glob (recursive directory traversal) for s3 table function- Possibility to control S3 storage class (STANDARD/INTELLIGENT_TIERING)
s3_plain
disk type for write-once-read-many operationsStorageIceberg
,Hudi
andDeltaLake
(with corresponding table functionsiceberg
,hudi
anddeltaLake
) to access data stored on S3- OSS / oss (Alibaba Cloud Object Storage Service)
- A lot of improvements to Parquet performance and compatibility a), but even more comes in 23.4 and later releases
- Operational:
- Ability to reset settings:
SET max_block_size = DEFAULT
- Partial result on query cancellation:
partial_result_on_first_cancel=1
- Allow recording errors to a specified file while reading text formats (CSV, TSV):
input_format_record_errors_file_path=parse_errors
- Allow to ignore errors while pushing to MATERIALIZED VIEW:
materialized_views_ignore_errors=1
- Composable protocol configuration: allows adjusting different listen hosts and wrapping any protocol into PROXYv1
- Ability to reset settings:
As usual with ClickHouse, there are many performance and operational improvements in different server components.
a) - contributed by Altinity developers.
New Experimental Features (use with care)
- Added new infrastructure for query analysis and planning. Enable with
allow_experimental_analyzer=1
- Added experimental query result cache. Enable with
enable_experimental_query_result_cache=1
- New ANN index (approximate nearest neighbor) based on Annoy for vector searches (supporting L2Distance or cosineDistance lookups). Enable with
allow_experimental_annoy_index=1
- Added an experimental inverted index as a new secondary index type for efficient text search. Enable with
allow_experimental_inverted_index=1
- New experimental parallel replicas feature. Enable with
allow_experimental_parallel_reading_from_replicas=1
- New mode for splitting the work on replicas using settings
parallel_replicas_custom_key
andparallel_replicas_custom_key_filter_type
- Added a new storage engine KeeperMap, which uses ClickHouse Keeper or ZooKeeper as a key-value store. Example:
CREATE TABLE map (key String, value UInt32) ENGINE = KeeperMap('/path/in/zk') PRIMARY KEY (key);
It also supports forDELETE
/UPDATE
operations - Support deduplication for asynchronous inserts
UNDROP TABLE
. Enable withallow_experimental_undrop_table_query=1
- Initial implementation of Kusto Query Language
Major changes that require attention
max_threads
default value has been changed. In all previous versions it was half of the available VM cores. In 23.3 it matches the number of cores. E.g. for 8 vCore VM it is now auto(8), while it was auto(4) before. That also affected other ‘max_*_threads’ settings:max_alter_threads
,max_final_threads
,max_part_loading_threads
andmax_part_removal_threads
- In version 23.3, the default value of
max_replicated_merges_in_queue
for ReplicatedMergeTree tables increased from 16 to 1000. We recommend keeping it equal to thebackground_pool_size
unless you have a very high number of replicas - With a high number of tables the number of ZooKeeper requests can be much higher after the upgrade (number of LIST operations). You may improve that by tuning
cleanup_delay_period
,merge_selecting_sleep_ms
,background_schedule_pool_size
Backward Incompatible Changes
The following changes are backward incompatible and require user attention during an upgrade:
- The following combinations of data types & codecs are forbidden now by default (you can allow them by setting
allow_suspicious_codecs = true)
:Gorilla
codec on columns of non-Float32 or non-Float64 type- Codecs
Delta
orDoubleDelta
followed by codecsGorilla
orFPC
- The command
GRANT ALL
issued by the default user will not work anymore because the ‘default’ user does not have the grant to manageNAMED COLLECTION
by default. You can allow them by extra user-level settings, e.g. for ‘default’ user:
<default>
<access_management>1</access_management>
<named_collection_control>1</named_collection_control>
<show_named_collections>1</show_named_collections>
<show_named_collections_secrets>1</show_named_collections_secrets>
</default>
- Catboost models are now evaluated via an external process called clickhouse-library-bridge.
modelEvaluate
was renamed tocatboostEvaluate
- Parallel quorum inserts are not supported anymore for MergeTree tables created with old syntax.
- The
WITH TIMEOUT
clause forLIVE VIEW
is not supported anymore - JOIN with constant expressions is not allowed anymore. E.g. this one will fail:
JOIN ON t1.x = t2.x AND 1 = 1
. However,JOIN ON 1=1
works, it is converted to FULL OUTER JOIN - Support for Decimal256 was added to some functions, meaning that their resulting type can now change (if the argument was also wide Decimal)
- Kafka tables with DEFAULT/EPHEMERAL/MATERIALIZED columns are forbidden now. They used to be allowed before but never worked
- toDayOfWeek now can accept 3 arguments, and the meaning of the second argument was changed from timezone to the marker of the weekstart day
- secondary indices with constant or non-deterministic expressions are now forbidden
- The following commands/syntaxes are abandoned (no-op now):
SYSTEM RESTART DISK
PREALLOCATE
option forHASHED/SPARSE_HASHED
dictionaries
- Some commands and settings changed their names:
- function
filesystemFree
was renamed tofilesystemUnreserved
- setting
max_query_cache_size
was renamed tofilesystem_cache_max_download_size
- table function
MeiliSearch
was renamed tomeilisearch
- function
JSONExtract
family of functions will now attempt to coerce to the requested type. In case of long integers in the JSON, the result can be different from the older version. For example:- 22.8:
JSONExtractInt('{"a":"5"}', 'a') = 0
- 23.3:
JSONExtractInt('{"a":"5"}', 'a') = 5
- 22.8:
Upgrade Notes
There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.
- Upgrading from 20.3 and older to 22.9 and newer should be done through an intermediate version if there are any ReplicatedMergeTree tables; otherwise, the server with the new version will not start. Here is a possible upgrade path if you are upgrading from 20.3: 20.3 -> 22.8 -> 23.3
Known Issues in 23.3.x
The development team continues to improve the quality of the 23.3 release. The following issues still exist in the 23.3.8 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications.
General stability issues:
- Problems with JIT can lead to incorrect query results and various exceptions. We recommend disabling
compile_expressions
andcompile_aggregate_expressions
settings completely. See ClickHouse/ClickHouse#51368 ClickHouse/ClickHouse#51090
Some queries that worked in 22.8 may fail in 23.3:
- count() over Nullable LowCardinality column returns incorrect results ClickHouse/ClickHouse#52055
- Nulls wrapped in tuples / arrays with Nulls inside used to work in older versions, but it was giving non-deterministic results. Now such rows are ignored by aggregate functions, and may return exceptions when passed to functions which expect scalars ClickHouse/ClickHouse#41595 ClickHouse/ClickHouse#51541 ClickHouse/ClickHouse#48623
- ‘Cannot find column’ in read-in-order optimization with
ARRAY JOIN
ClickHouse/ClickHouse#48682
There are a few performance regressions:
- Some queries may start working slower due to changes in the
optimize_move_to_prewhere
strategy. If it impacts you, you can try to set upPREWHERE
manually, or use new settingsmove_all_conditions_to_prewhere
andenable_multiple_prewhere_read_steps
ClickHouse/ClickHouse#49735 ClickHouse/ClickHouse#50399 ClickHouse/ClickHouse#51849 - Some scenarios of using projection don’t work as expected. If you use projections - please check if the queries are still working as expected ClickHouse/ClickHouse#51173 ClickHouse/ClickHouse#49150 – fixed in 23.5, but not backported
You may also look into a GitHub issues using v23.3-affected label.
Other Important Changes
Several settings have changed their defaults:
max_suspicious_broken_parts
has changed from 10 to 100async_insert_max_data_size
has changed from 100000 to 1000000optimize_distinct_in_order
,optimize_duplicate_order_by_and_distinct
,optimize_monotonous_functions_in_order_by
,optimize_rewrite_sum_if_to_count_if
were disableds3_upload_part_size_multiply_parts_count_threshold
has been changed from 1000 to 500
Some new ClickHouse features and optimizations are now enabled by default. It may lead to a change in behavior, so review those carefully and disable features that may affect your system:
enable_reads_from_query_cache
enable_software_prefetch_in_aggregation
enable_writes_to_query_cache
input_format_json_defaults_for_missing_elements_in_named_tuple
input_format_json_ignore_unknown_keys_in_named_tuple
input_format_json_named_tuples_as_objects
input_format_json_read_objects_as_strings
input_format_native_allow_types_conversion
optimize_rewrite_aggregate_function_with_if
optimize_rewrite_array_exists_to_has
query_plan_aggregation_in_order
query_plan_optimize_projection
query_plan_read_in_order
query_plan_remove_redundant_distinct
query_plan_remove_redundant_sorting
regexp_dict_allow_hyperscan
schema_inference_make_columns_nullable
single_join_prefer_left_table
ClickHouse embedded monitoring is since 21.8. It now collects host level metrics, and stores them every second in the table system.asynchronious_metric_log
. This can be visible as an increase of background writes, storage usage, etc. To return to the old rate of metrics refresh / flush, adjust those settings in config.xml
:
<asynchronous_metrics_update_period_s>
60
</asynchronous_metrics_update_period_s>
<asynchronous_metric_log>
<flush_interval_milliseconds>
60000
</flush_interval_milliseconds>
</asynchronous_metric_log>
In the previous releases we recommended disabling optimize_on_insert
. This recommendation stays for 22.8 as well as inserts into Summing
and AggregatingMergeTree
can slow down.
Changes in Altinity Stable build Compared to Community Build
ClickHouse Altinity Stable builds are based on the upstream LTS versions. Altinity.Stable 23.3.8.22 is based on upstream 23.3.8.21-lts, but we have additionally backported several fixes:
- Fix key condition on duplicate primary keys ClickHouse/ClickHouse#48838 via Altinity/ClickHouse#275
- Do not build sets for skip indexes if they are not used ClickHouse/ClickHouse#48299 via Altinity/ClickHouse#276
Let’s Install!
Linux packages can be found at:
- https://packages.clickhouse.com for Upstream builds, and at
- https://builds.altinity.cloud for Altinity Stable builds.
Note: naming schema for Altinity.Stable build packages has been changed since 21.8.x.
21.8.x | 22.3.x and later versions |
---|---|
<package>_<ver>.altinitystable_all.deb | <package>_<ver>.altinitystable_amd64.deb |
<package>-<ver>.altinitystable-2.noarch.rpm | <package>-<ver>.altinitystable.x86_64.rpm |
Docker images for the upstream version should be referenced as ‘clickhouse/clickhouse-server:23.3.8.21’
.
Altinity Stable build images are available as ‘altinity/clickhouse-server:23.3.8.22.altinitystable’
.
For more information on installing ClickHouse from either the Altinity Builds or the Upstream Builds, see the ClickHouse Altinity Stable Release Build Install Guide.
Please contact us at info@altinity.com if you experience any issues with the upgrade.
Appendix
New table functions
- deltaLake
- hudi
- Iceberg
- oss
- viewExplain
New table engines
- DeltaLake
- Hudi
- Iceberg
- KeeperMap
- OSS
New functions
- Date / Time
- age
- timeDiff
- UTCTimestamp / UTC_timestamp
- date_trunc / DATE_TRUNC
- DATE_FORMAT / formatDateTimeInJodaSyntax / fromUnixTimestampInJodaSyntax
- parseDateTime(-InJodaSyntax, -OrNull, -OrZero) / str_to_date
- addInterval / addTupleOfIntervals / subtractInterval / subtractTupleOfIntervals
- TO_UNIXTIME
- Arrays
- arrayPartialSort
- arrayPartialReverseSort
- arrayShuffle
- arrayPartialShuffle
- mapFromArrays / MAP_FROM_ARRAYS
- Hash
- sipHash128Keyed / sipHash128Reference / sipHash128ReferenceKeyed sipHash64Keyed
- BLAKE3
- xxh3
- javaHash (has been extended to integers)
- Strings
- regexpExtract (similar to extract)
- concatWithSeparator / concat_ws (+ concatWithSeparatorAssumeInjective)
- hasTokenOrNull / hasTokenCaseInsensitiveOrNull (also supported in tokenbf_v1 index lookups)
- ascii
- Math / Statistical / Random generation
- toDecimalString a)
- pmod (positiveModulo / positive_modulo)
- multiplyDecimal and divideDecimal a)
- factorial
- widthBucket / WIDTH_BUCKET
- rand(-Uniform,-Normal,-LogNormal,-Exponential,-ChiSquared,-StudentT, -FisherF,-Bernoulli,-Binomial,-NegativeBinomial,-Poisson)
- randCanonical (similar to the rand function in Apache Spark)
- Aggregate / window functions
- ntile
- analysisOfVariance / anova
- uniqThetaIntersect / uniqThetaNot / uniqThetaUnion (Apache Datasketches)
- groupArrayLast(X)(col)
- medianInterpolatedWeighted / quantileInterpolatedWeighted / - quantilesInterpolatedWeighted
- corrMatrix / covarPopMatrix / covarSampMatrix
- URL
- cutURLParameter (improved)
- cutToFirstSignificantSubdomainCustomRFC
- cutToFirstSignificantSubdomainCustomWithWWWRFC
- cutToFirstSignificantSubdomainRFC
- cutToFirstSignificantSubdomainWithWWWRFC
- topLevelDomainRFC
- domainRFC
- domainWithoutWWWRFC
- firstSignificantSubdomainCustomRFC / firstSignificantSubdomainRFC
- portRFC
- Other
- displayName
- getSubcolumn
- nested
- formatReadableDecimalSize
- generateULID / ULIDStringToDateTime
- mortonDecode / mortonEncode (ZCurve)
- JSONArrayLength / JSON_ARRAY_LENGTH
- toIPv4OrZero / toIPv6OrZero
- tryBase58Decode / tryDecrypt
- filesystemAvailable and similar: can now accept disk name as argument
New formats
- BSONEachRow
- JSONObjectEachRow
- JSONStringEachRow
New system tables
- system.dropped_tables
- system.moves
- system.named_collections
- system.query_cache
- system.server_settings
New metrics and events
system.events
TBD
system.metrics
TBD
Changes in default configuration files
https://gist.github.com/filimonov/7268c5d45778838f201cc9539cf0958d#file-default_configs_change-md
Also, please refer to the interim release notes from the development team available at the following URLs:
- 22.9 Release notes: https://clickhouse.com/docs/en/whats-new/changelog/2022#-clickhouse-release-229-2022-09-22
- 22.10 Release notes: https://clickhouse.com/docs/en/whats-new/changelog/2022#-clickhouse-release-2210-2022-10-25
- 22.11 Release notes: https://clickhouse.com/docs/en/whats-new/changelog/2022#-clickhouse-release-2211-2022-11-17
- 22.12 Release notes: https://clickhouse.com/docs/en/whats-new/changelog/2022#-clickhouse-release-2212-2022-12-15
- 23.1 Release notes: https://clickhouse.com/docs/en/whats-new/changelog#-clickhouse-release-231-2023-01-26
- 23.2 Release notes: https://clickhouse.com/docs/en/whats-new/changelog#-clickhouse-release-232-2023-02-23
- 23.3 Release notes: https://clickhouse.com/docs/en/whats-new/changelog#-clickhouse-release-233-lts-2023-03-30