Altinity Stable Build for ClickHouse 23.3

It has been a while since we have certified the Altinity.Stable ClickHouse 22.8 release. It was delivered together with the Altinity Stable build for ClickHouse. Since then many things have happened to ClickHouse. More than 300 ClickHouse contributors from companies all around the world submitted 3000 pull requests with new features and improvements. Unfortunately, some changes affected the stability of new releases. That is why it took us 4 long months to get enough confidence in order to recommend 23.3 for production use and make sure upgrades go smoothly. As of 23.3.8 we are confident in certifying 23.3 as an Altinity Stable release.

Here are the detailed release notes and upgrade instructions for version 23.3.8. There are a lot of new features, but also many things that changed their behavior. Read these carefully before upgrading. There are additional notes for point releases.

Major new features in 23.3 since the previous stable release 22.8

Released 2023-07-14

A new release introduces a lot of changes and new functions. It is very hard to pick the most essential ones, so refer to the full list in the Appendix. The following major features are worth mentioning on the front page:

  • SQL features:
    • grace_hash JOIN algorithm
    • Manipulating named collections via DDL commands: SHOW/CREATE/ALTER/DROP named collection
    • Parameterized views: CREATE VIEW test AS SELECT ... WHERE user_id = {user:UInt64} and then SELECT * FROM test(user = 123)
    • Lightweight DELETEs are not marked as experimental anymore. There are still performance limitations, so do not consider them as a drop-in replacement for SQL DELETE statements
    • Regexp operator: SELECT ... WHERE col1 REGEXP 'a.*b'
    • Query cache
  • MergeTree improvements:
    • ReplacingMergeTree with deleted flag
    • final setting to implicitly apply the FINAL modifier to every table. See the blog article describing these new features of ReplacingMergeTree. a)
    • Control of scheduling merges using min_age_to_force_merge_seconds and min_age_to_force_merge_on_partition_only settings
    • Compressed marks and primary key on disk (disabled by default in 23.3)
    • Compressed marks in memory
  • Replication improvements:
    • insert_quorum = 'auto' to use the majority number
    • Automatic retries of inserts into replicated tables if (Zoo)Keeper is temporarily not available. Number of retries is controlled by insert_keeper_max_retries setting
    • Support for replication of user-defined SQL functions via (Zoo)Keeper
    • General performance and stability improvements when using Keeper
  • Security & access control
    • Certificate-based user authentication on the native protocol
    • Password complexity rules and checks
    • Automatically mask sensitive information like credentials in logs (without query masking rules)
    • Exposed applied row-level policies to system.query_log a)
  • Storage management
    • Allow to assign disks on the table level instead of storage policy: SETTINGS disk = '<disk_name>' (instead of storage_policy)
    • Explicit disk creation SETTINGS disk = disk(type=s3, ...)
    • Allow nested custom disks
    • Added S3 as a new type of destination for embedded backups. Supports BACKUP to S3 with as-is path/data structure.
    • Server-side copies for embedded S3 backups
  • Object storage
    • ** glob (recursive directory traversal) for s3 table function
    • Possibility to control S3 storage class (STANDARD/INTELLIGENT_TIERING)
    • s3_plain disk type for write-once-read-many operations
    • StorageIceberg, Hudi and DeltaLake (with corresponding table functions iceberg, hudi and deltaLake) to access data stored on S3
    • OSS / oss (Alibaba Cloud Object Storage Service)
    • A lot of improvements to Parquet performance and compatibility a), but even more comes in 23.4 and later releases
  • Operational:
    • Ability to reset settings: SET max_block_size = DEFAULT
    • Partial result on query cancellation: partial_result_on_first_cancel=1
    • Allow recording errors to a specified file while reading text formats (CSV, TSV): input_format_record_errors_file_path=parse_errors
    • Allow to ignore errors while pushing to MATERIALIZED VIEW: materialized_views_ignore_errors=1
    • Composable protocol configuration: allows adjusting different listen hosts and wrapping any protocol into PROXYv1

As usual with ClickHouse, there are many performance and operational improvements in different server components.


a) Contributed by Altinity developers.

New Experimental Features (use with care)

  • Added new infrastructure for query analysis and planning. Enable with allow_experimental_analyzer=1
  • Added experimental query result cache. Enable with enable_experimental_query_result_cache=1
  • New ANN index (approximate nearest neighbor) based on Annoy for vector searches (supporting L2Distance or cosineDistance lookups). Enable with allow_experimental_annoy_index=1
  • Added an experimental inverted index as a new secondary index type for efficient text search. Enable with allow_experimental_inverted_index=1
  • New experimental parallel replicas feature. Enable with allow_experimental_parallel_reading_from_replicas=1
  • New mode for splitting the work on replicas using settings parallel_replicas_custom_key and parallel_replicas_custom_key_filter_type
  • Added a new storage engine KeeperMap, which uses ClickHouse Keeper or ZooKeeper as a key-value store. Example: CREATE TABLE map (key String, value UInt32) ENGINE = KeeperMap('/path/in/zk') PRIMARY KEY (key); It also supports for DELETE / UPDATE operations
  • Support deduplication for asynchronous inserts
  • UNDROP TABLE. Enable with allow_experimental_undrop_table_query=1
  • Initial implementation of Kusto Query Language

Major changes that require attention

  • max_threads default value has been changed. In all previous versions it was half of the available VM cores. In 23.3 it matches the number of cores. E.g. for 8 vCore VM it is now auto(8), while it was auto(4) before. That also affected other ‘max_*_threads’ settings: max_alter_threads, max_final_threads, max_part_loading_threads and max_part_removal_threads
  • In version 23.3, the default value of max_replicated_merges_in_queue for ReplicatedMergeTree tables increased from 16 to 1000. We recommend keeping it equal to the background_pool_size unless you have a very high number of replicas
  • With a high number of tables the number of ZooKeeper requests can be much higher after the upgrade (number of LIST operations). You may improve that by tuning cleanup_delay_period, merge_selecting_sleep_ms, background_schedule_pool_size

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • The following combinations of data types & codecs are forbidden now by default (you can allow them by setting allow_suspicious_codecs = true):
    • Gorilla codec on columns of non-Float32 or non-Float64 type
    • Codecs Delta or DoubleDelta followed by codecs Gorilla or FPC
  • The command GRANT ALL issued by the default user will not work anymore because the ‘default’ user does not have the grant to manage NAMED COLLECTION by default. You can allow them by extra user-level settings, e.g. for ‘default’ user:
<default>
    <access_management>1</access_management>
    <named_collection_control>1</named_collection_control>
    <show_named_collections>1</show_named_collections>
    <show_named_collections_secrets>1</show_named_collections_secrets>
</default>
  • Catboost models are now evaluated via an external process called clickhouse-library-bridge. modelEvaluate was renamed to catboostEvaluate
  • Parallel quorum inserts are not supported anymore for MergeTree tables created with old syntax.
  • The WITH TIMEOUT clause for LIVE VIEW is not supported anymore
  • JOIN with constant expressions is not allowed anymore. E.g. this one will fail: JOIN ON t1.x = t2.x AND 1 = 1. However, JOIN ON 1=1 works, it is converted to FULL OUTER JOIN
  • Support for Decimal256 was added to some functions, meaning that their resulting type can now change (if the argument was also wide Decimal)
  • Kafka tables with DEFAULT/EPHEMERAL/MATERIALIZED columns are forbidden now. They used to be allowed before but never worked
  • toDayOfWeek now can accept 3 arguments, and the meaning of the second argument was changed from timezone to the marker of the weekstart day
  • secondary indices with constant or non-deterministic expressions are now forbidden
  • The following commands/syntaxes are abandoned (no-op now):
    • SYSTEM RESTART DISK
    • PREALLOCATE option for HASHED/SPARSE_HASHED dictionaries
  • Some commands and settings changed their names:
    • function filesystemFree was renamed to filesystemUnreserved
    • setting max_query_cache_size was renamed to filesystem_cache_max_download_size
    • table function MeiliSearch was renamed to meilisearch
  • JSONExtract family of functions will now attempt to coerce to the requested type. In case of long integers in the JSON, the result can be different from the older version. For example:
    • 22.8: JSONExtractInt('{"a":"5"}', 'a') = 0
    • 23.3: JSONExtractInt('{"a":"5"}', 'a') = 5

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • Upgrading from 20.3 and older to 22.9 and newer should be done through an intermediate version if there are any ReplicatedMergeTree tables; otherwise, the server with the new version will not start. Here is a possible upgrade path if you are upgrading from 20.3: 20.3 -> 22.8 -> 23.3

Known Issues in 23.3.x

The development team continues to improve the quality of the 23.3 release. The following issues still exist in the 23.3.8 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications.

General stability issues:

Some queries that worked in 22.8 may fail in 23.3:

There are a few performance regressions:

You may also look into a GitHub issues using v23.3-affected label.​​

Other Important Changes

Several settings have changed their defaults:

  • max_suspicious_broken_parts has changed from 10 to 100
  • async_insert_max_data_size has changed from 100000 to 1000000
  • optimize_distinct_in_order, optimize_duplicate_order_by_and_distinct, optimize_monotonous_functions_in_order_by, optimize_rewrite_sum_if_to_count_if were disabled
  • s3_upload_part_size_multiply_parts_count_threshold has been changed from 1000 to 500

Some new ClickHouse features and optimizations are now enabled by default. It may lead to a change in behavior, so review those carefully and disable features that may affect your system:

  • enable_reads_from_query_cache
  • enable_software_prefetch_in_aggregation
  • enable_writes_to_query_cache
  • input_format_json_defaults_for_missing_elements_in_named_tuple
  • input_format_json_ignore_unknown_keys_in_named_tuple
  • input_format_json_named_tuples_as_objects
  • input_format_json_read_objects_as_strings
  • input_format_native_allow_types_conversion
  • optimize_rewrite_aggregate_function_with_if
  • optimize_rewrite_array_exists_to_has
  • query_plan_aggregation_in_order
  • query_plan_optimize_projection
  • query_plan_read_in_order
  • query_plan_remove_redundant_distinct
  • query_plan_remove_redundant_sorting
  • regexp_dict_allow_hyperscan
  • schema_inference_make_columns_nullable
  • single_join_prefer_left_table

ClickHouse embedded monitoring is since 21.8. It now collects host level metrics, and stores them every second in the table system.asynchronious_metric_log. This can be visible as an increase of background writes, storage usage, etc. To return to the old rate of metrics refresh / flush, adjust those settings in config.xml:

<asynchronous_metrics_update_period_s>
    60
</asynchronous_metrics_update_period_s>
<asynchronous_metric_log>
    <flush_interval_milliseconds>
        60000
    </flush_interval_milliseconds>
</asynchronous_metric_log>

In the previous releases we recommended disabling optimize_on_insert. This recommendation stays for 22.8 as well as inserts into Summing and AggregatingMergeTree can slow down.

Changes in Altinity Stable build Compared to Community Build

ClickHouse Altinity Stable builds are based on the upstream LTS versions. Altinity.Stable 23.3.8.22 is based on upstream 23.3.8.21-lts, but we have additionally backported several fixes:

Let’s Install!

Linux packages can be found at:

Note: naming schema for Altinity.Stable build packages has been changed since 21.8.x.

21.8.x 22.3.x and later versions
<package>_<ver>.altinitystable_all.deb <package>_<ver>.altinitystable_amd64.deb
<package>-<ver>.altinitystable-2.noarch.rpm <package>-<ver>.altinitystable.x86_64.rpm

Docker images for the upstream version should be referenced as ‘clickhouse/clickhouse-server:23.3.8.21’.

Altinity Stable build images are available as ‘altinity/clickhouse-server:23.3.8.22.altinitystable’.

For more information on installing ClickHouse from either the Altinity Builds or the Upstream Builds, see the ClickHouse Altinity Stable Release Build Install Guide.

Please contact us at info@altinity.com if you experience any issues with the upgrade.

Appendix

New table functions

  • deltaLake
  • hudi
  • Iceberg
  • oss
  • viewExplain

New table engines

  • DeltaLake
  • Hudi
  • Iceberg
  • KeeperMap
  • OSS

New functions

  • Date / Time
    • age
    • timeDiff
    • UTCTimestamp / UTC_timestamp
    • date_trunc / DATE_TRUNC
    • DATE_FORMAT / formatDateTimeInJodaSyntax / fromUnixTimestampInJodaSyntax
    • parseDateTime(-InJodaSyntax, -OrNull, -OrZero) / str_to_date
    • addInterval / addTupleOfIntervals / subtractInterval / subtractTupleOfIntervals
    • TO_UNIXTIME
  • Arrays
    • arrayPartialSort
    • arrayPartialReverseSort
    • arrayShuffle
    • arrayPartialShuffle
    • mapFromArrays / MAP_FROM_ARRAYS
  • Hash
    • sipHash128Keyed / sipHash128Reference / sipHash128ReferenceKeyed sipHash64Keyed
    • BLAKE3
    • xxh3
    • javaHash (has been extended to integers)
  • Strings
    • regexpExtract (similar to extract)
    • concatWithSeparator / concat_ws (+ concatWithSeparatorAssumeInjective)
    • hasTokenOrNull / hasTokenCaseInsensitiveOrNull (also supported in tokenbf_v1 index lookups)
    • ascii
  • Math / Statistical / Random generation
    • toDecimalString a)
    • pmod (positiveModulo / positive_modulo)
    • multiplyDecimal and divideDecimal a)
    • factorial
    • widthBucket / WIDTH_BUCKET
    • rand(-Uniform,-Normal,-LogNormal,-Exponential,-ChiSquared,-StudentT, -FisherF,-Bernoulli,-Binomial,-NegativeBinomial,-Poisson)
    • randCanonical (similar to the rand function in Apache Spark)
  • Aggregate / window functions
    • ntile
    • analysisOfVariance / anova
    • uniqThetaIntersect / uniqThetaNot / uniqThetaUnion (Apache Datasketches)
    • groupArrayLast(X)(col)
    • medianInterpolatedWeighted / quantileInterpolatedWeighted / - quantilesInterpolatedWeighted
    • corrMatrix / covarPopMatrix / covarSampMatrix
  • URL
    • cutURLParameter (improved)
    • cutToFirstSignificantSubdomainCustomRFC
    • cutToFirstSignificantSubdomainCustomWithWWWRFC
    • cutToFirstSignificantSubdomainRFC
    • cutToFirstSignificantSubdomainWithWWWRFC
    • topLevelDomainRFC
    • domainRFC
    • domainWithoutWWWRFC
    • firstSignificantSubdomainCustomRFC / firstSignificantSubdomainRFC
    • portRFC
  • Other
    • displayName
    • getSubcolumn
    • nested
    • formatReadableDecimalSize
    • generateULID / ULIDStringToDateTime
    • mortonDecode / mortonEncode (ZCurve)
    • JSONArrayLength / JSON_ARRAY_LENGTH
    • toIPv4OrZero / toIPv6OrZero
    • tryBase58Decode / tryDecrypt
    • filesystemAvailable and similar: can now accept disk name as argument

New formats

  • BSONEachRow
  • JSONObjectEachRow
  • JSONStringEachRow

New system tables

  • system.dropped_tables
  • system.moves
  • system.named_collections
  • system.query_cache
  • system.server_settings

New metrics and events

system.events

TBD

system.metrics

TBD

Changes in default configuration files

https://gist.github.com/filimonov/7268c5d45778838f201cc9539cf0958d#file-default_configs_change-md


Also, please refer to the interim release notes from the development team available at the following URLs:

23.3.19

Released 2024-03-05

See the notes for the 23.3.19 Altinity Stable release on GitHub.

23.3.19 FIPS

Released 2024-03-21

This is a FIPS-compatible release of version 23.3.19.

See the notes for the 23.3.19 Altinity FIPS release on GitHub.

23.3.13

Released 2023-10-06

Changes compared to upstream's v23.3.13.6-lts

ARM support

Packages (.deb, .rpm, or .tgz) and container images are now available for the AARCH64 (ARM64) architecture.

Bug fixes

Performance improvements

Improvements

Build/Testing/Packaging Improvements

  • Bring back DNS tests (ClickHouse#53286 - @arthurpassos via #296)
  • Made builds and tests possible in Altinity’s infrastructure
  • Support of Aarch64 builds (#311)
    • building Aarch64 binaries
    • multi-arch docker containers (Amd64, Aarch64)
    • running stateless, stateful tests against Aarch64
    • running clickhouse-regression tests against Aarch64

Build Report

You can see the results of the build in the complete build report.

Upstream changes between v23.3.8.21-lts (1675f2264f3) and v23.3.13.6-lts (25635e27551)

Performance Improvements

  • Backported in #52213: Do not store blocks in ANY hash join if nothing is inserted. #48633 (vdimir).
  • Backported in #52826: Fixed incorrect projection analysis which invalidates primary keys. This issue only exists when query_plan_optimize_primary_key = 1, query_plan_optimize_projection = 1 . This fixes #48823, #51173, and #52308 (Amos Bird).

Build/Testing/Packaging Improvements

Bug Fixes (user-visible misbehavior in an official stable release)

NOT FOR CHANGELOG / INSIGNIFICANT

Packages

Available for both AMD64 and Aarch64 from https://builds.altinity.cloud/ as either .deb, .rpm, or .tgz

Docker images

Available for both AMD64 and Aarch64: altinity/clickhouse-server/23.3.13.7.altinitystable

ClickHouse release notes

Interim release notes are also available:

23.3.8 FIPS

Released 2023-09-25

FIPS-Compatible Altinity Stable Build for ClickHouse 23.3.8 is based on Altinity Stable for ClickHouse 23.3.8

Changes Compared to Altinity Stable 23.3.8

FIPS-related changes atop of v23.3.8.22.altinitystable

  • Using BoringSSL built in FIPS mode
  • Added FIPS_CLICKHOUSE to system.build_options
  • Modified ClickHouse Keeper to use the full range of openSSL options, just as ClickHouse does for HTTP, TCP, and other types of server connections
  • A bit of extra logging for FIPS mode at startup.