Altinity Stable® Build for ClickHouse® 23.8

Here are the detailed release notes and upgrade instructions for version 23.8.8.21. There are a lot of new features, but also many things that changed their behavior. Read these carefully before upgrading.

Major new features in 23.8 since the previous Altinity Stable release 23.3

Released 2023-12-27

Build report - 23.8.8.21 | Release page

A new release introduces a lot of changes and new functions. It is very hard to pick the most essential ones, so refer to the full list in the Appendix. The following major features are worth mentioning on the front page:

  • Few features that were experimental are now graduated as production ready:

    • Projections #49719
    • Query Results cache #47977. See our Caching Guide for examples.
    • Geographical data types: Point, Ring, Polygon, and MultiPolygon #50022
  • SQL features:

    • Overlay databases for external data:

    CREATE DATABASE files ENGINE = Filesystem('subdir/') CREATE DATABASE bucket ENGINE = S3('https://clickhouse-public-datasets.s3.amazonaws.com/')

    • SELECT ... INTO OUTFILE '...' APPEND / TRUNCATE #48880

    • PRQL query language #50686

    • Plus and minus operations for arrays: SELECT [1, 2, 3] + [4, 5, 6] #52625. Multiplication and division comes in later releases.

    • Direct import from archives (files only):

      SELECT * FROM file('path/to/archive.zip :: path/inside/archive.csv')
      
  • MergeTree improvements:

    • There were a lot of improvements in order to reduce memory usage for merges, loading parts. FINAL, index analysis and others.
    • Added setting async_insert for MergeTree tables. It has the same meaning as query-level setting async_insert and enables asynchronous inserts for a specific table. #49122.
  • Replication improvements:

    • Improve insert retries on keeper session expiration. #52688.
    • Reduced number ZooKeeper requests when selecting parts to merge and a lot of partitions do not have anything to merge. #49637. – in general, users would observe lower ZooKeeper load when upgrading from 23.3 to 23.8
  • Security & access control

    • ROW POLICY can be applied to all tables in a database. #47640. a)
    • Added server-side encryption using KMS keys with S3 tables, and the header setting with S3 disks. #48724.
    • Add server and format settings display_secrets_in_show_and_select for displaying secrets of tables, databases, table functions, and dictionaries. Add privilege displaySecretsInShowAndSelect controlling which users can view secrets. #46528.
    • Adding the grants field in the users.xml file, which allows specifying grants for users. #49381.
    • GRANT CURRENT GRANTS
  • Object storage

    • Support proxy for S3 access a)read our article for more detail
    • Native support for GCP via gcs table function
    • Native support for Azure via azureBlobStorage and azureBlobStorageCluster table functions
  • Operational:

    • New tables for introspection:
      • system.zookeeper_connection
      • system.kafka_consumers
    • Server and query level IO throttling for different operations (#48242):
      • server settings: max_remote_read_network_bandwidth_for_server, max_remote_write_network_bandwidth_for_server, max_local_read_bandwidth_for_server, max_local_write_bandwidth_for_server, max_backup_bandwidth_for_server
      • profile settings: max_remote_read_network_bandwidth, max_remote_write_network_bandwidth, max_local_read_bandwidth, max_local_write_bandwidth, max_backup_bandwidth.

As usual with ClickHouse, there are many performance and operational improvements in different server components.


a) Contributed by Altinity developers.

Major changes that require attention

  • use_environment_credentials is enabled by default since 23.4. That blocks access to anonymous S3 buckets. The recommended way accessing public buckets is NOSIGN keyword supported for both S3 table function and engine.
  • Compression of mark files and primary keys is enabled by default. It produces different binary representations of those files. This is controlled by compress_marks and compress_primary_key merge tree settings.
    • It is not possible to downgrade from 23.8 to 22.8 or earlier
    • If you upgrade from versions prior to 22.9, you should either upgrade all replicas at once or disable the compression before upgrade, or upgrade through an intermediate version, where the compressed marks are supported but not enabled by default, such as 23.3.
  • sparse columns enabled by default. It produces different binary representation, so if used downgrade to versions older than 22.1 might not be possible.. This can be disabled by ratio_of_defaults_for_sparse_serialization=1.
  • max_concurrent_queries default value has been increased from 100 to 1000. It may be too high in same cases, max_concurrent_queries * max_threads should be lower than ~8000.
  • Maximum number of parts per partition restriction (“too many parts” error) has been relaxed:
    • parts_to_delay_insert=1000 (was: 150)
    • parts_to_throw_insert=3000 (was: 300)
    • There is a new setting max_avg_part_size_for_too_many_parts=1073741824, so ‘parts_to_throw’ restriction is applied only for partitions of 1TB size or above. This may result in slower performance, so changing ‘parts_to_throw’ to smaller values is recommended.
  • Metadata cache has been removed
  • The new feature system.kafka_consumers enables collecting librdkafka statistics every 3 seconds by default. If you have Kafka tables which you don’t read from, that can lead to collecting those unprocessed stat messages in memory. Collection is disabled in Altinity.Stable Build 23.8.8 and upstream build 23.8.9. You may turn it on and off using server level Kafka setting:
<kafka>
    <statistics_interval_ms> <!-- Set non-zero value to enable -->
    0
    </statistics_interval_ms>
</kafka>

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • Using named collections requires to have USE NAMED COLLECTION grant, otherwise Not enough privileges. To execute this query it's necessary to have grant NAMED COLLECTION ON exception will be raised.
  • allow_experimental_query_cache profile setting has been removed (the feature is production ready) #52685. ClickHouse will not start if it is defined, so please removed it before an upgrade
  • Several features have been removed:
    • hashid #52449
    • do_not_evict_index_and_mark_files #51253
    • ALTER support for experimental LIVE VIEW #51287
    • in-memory data parts #49429
    • metadata cache #51303
  • The microseconds column is removed from the system.text_log, and from the system.metric_log, because they are redundant in the presence of the event_time_microseconds column. #53601

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • Upgrading from 20.3 and older to 22.9 and newer should be done through an intermediate version if there are any ReplicatedMergeTree tables; otherwise, the server with the new version will not start. #40641. Here is a possible upgrade path if you are upgrading from 20.3: 20.3 -> 22.8 -> 23.3 -> 23.8
  • If you upgrade from versions prior to 22.9, you should either upgrade all replicas at once or disable compress_marks and compress_primary_key merge tree settings before upgrade, or upgrade through an intermediate version, where the compressed marks are supported but not enabled by default, such as 23.3.
  • Downgrading from 23.8 to version 23.5 or below may fail due to changes in sparse columns serialization, see #55153 for possible workaround.

Known Issues in 23.8.x

The development team continues to improve the quality of the 23.8 release. The following issues still exist in the 23.8.8 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications.

General stability issues:

Some queries that worked in 23.3 may fail in 23.8:

There are a few performance regressions:

You may also look into a GitHub issues using v23.8-affected label.​​

Other Important Changes

Several settings have changed their defaults:

  • distributed_ddl_entry_format_version=5 (was: 3) – enables open telemetry and initial_query_id pass through. #52128
    • downgrading to versions with unprocessed DDL entries can lead to errors
  • connect_timeout_with_failover_ms=1000 (was: 50)
  • connect_timeout_with_failover_secure_ms=1000 (was: 100)
  • hedged_connection_timeout_ms=50 (was: 100)
  • http_max_field_name_size=131072 (was: 1048576)
  • http_max_field_value_size=131072 (was: 1048576)
  • http_receive_timeout=30 (was: 180)
  • http_send_timeout=30 (was: 180)

Some new ClickHouse features and optimizations are now enabled by default. It may lead to a change in behavior, so review those carefully and disable features that may affect your system:

  • merge_tree_settings.allow_vertical_merges_from_compact_to_wide_parts
  • async_query_sending_for_remote
  • check_dictionary_primary_key
  • projection metadata now is checked on table attach #52361
  • move_all_conditions_to_prewhere, enable_multiple_prewhere_read_steps, move_primary_key_columns_to_end_of_prewhere
  • enable_memory_bound_merging_of_aggregation_results – if you upgrade from version prior to 22.12, we recommend setting this flag to false until the upgrade is finished.
  • optimize_distinct_in_order
  • optimize_count_from_files
  • optimize_use_projections, optimize_use_implicit_projections

system.query_views_log now collects information about MVs that are pushed from background threads, like Kafka / Rabbit etc. #46668. You can have more (useful!) data there after the upgrade.

LZ4 & ZSTD compression libraries were upgraded, checksum mismatch when having replicas with older versions are possible

In the previous releases we recommended disabling optimize_on_insert. This recommendation stays for 23.8 as well as inserts into Summing and AggregatingMergeTree can slow down.

Changes in Altinity Stable build Compared to Upstream Build

ClickHouse Altinity Stable builds are open source and are based on the upstream LTS versions. Altinity.Stable 23.8.8 is based on upstream 23.8.8.20-lts, but we have additionally backported several fixes:

  • Fix key analysis (with set) for Merge engine #54905 via #341
  • Fix partition pruning of extra columns in set #55172 via #342
  • Fix FINAL produces invalid read ranges in a rare case #54934 via #343
  • Fix incorrect free space accounting for least_used JBOD policy #56030 via #344
  • Fix ALTER COLUMN with ALIAS #56493 via #345 (backported into 23.8.9 upstream)
  • Disable system.kafka_consumers by default due to possible live memory leak #57822 via #346 (backported into 23.8.9 upstream)

Let’s Install!

Linux packages can be found at https://packages.clickhouse.com for upstream builds, and at https://builds.altinity.cloud for Altinity Stable builds.

Note: naming schema for Altinity.Stable build packages has been changed since 21.8.x.

21.8.x 22.3.x and later versions
<package>_<ver>.altinitystable_all.deb <package>_<ver>.altinitystable_amd64.deb
<package>-<ver>.altinitystable-2.noarch.rpm <package>-<ver>.altinitystable.x86_64.rpm

Docker images for the upstream version should be referenced as clickhouse/clickhouse-server:23.8.8.20

Altinity Stable build images are available as altinity/clickhouse-server:23.8.8.21.altinitystable.

For more information on installing ClickHouse from either the Altinity Builds or the upstream builds, see the ClickHouse Altinity Stable Builds Install Guide.

Please contact us at info@altinity.com if you experience any issues with the upgrade.


Appendix

New table functions

  • azureBlobStorage, azureBlobStorageCluster
  • gcs
  • redis
  • urlCluster

New table engines

  • AzureBlobStorage
  • Redis
  • S3Queue

New functions

COMING SOON

New formats

  • One – doesn’t read any data and always returns a single row with column dummy with type UInt8 and value 0 like system.one. It can be used together with _file/_path virtual columns to list files in file/s3/url/hdfs/etc table functions without reading any data
  • ParquetMetadata – providers introspection into Parquet files, see this excellent blog article for detail
  • PrettyJSONEachRow
  • PrettyJSONLines
  • PrettyNDJSON
  • RowBinaryWithDefaults

New system tables

  • system.jemalloc_bins
  • system.kafka_consumers
  • system.user_processes – query statistics grouped by user
  • merge_tree_metadata_cache was removed

New columns in system tables

  • system.clusters: database_shard_name, database_replica_name, is_active, name (alias to cluster)
  • system.filesystem_cache: cache_name, key, file_size
  • system.functions: syntax, arguments, returned_value, examples, categories – those are populated for new functions only, but eventually may be filled in for all
  • system.merge_tree_settings/replicated_merge_tree_settings/settings/server_settings: is_obsolete
  • system.merges: partitions, total_size_uncompressed
  • system.metrics: name (alias to metric)
  • system.parts: primary_key_size, part_name (alias to name)
  • system.parts_columns: column_modification_time, part_name (alias to name)
  • system.projection_parts: part_name (alias to name)
  • system.projection_parts_columns: column_modification_time, part_name (alias to name)
  • system.query_cache: compressed
  • system.replicas: zookeeper_name, lost_part_count
  • system.storage_policies: perform_ttl_on_insert, load_balancing

New metrics and events

system.events:

Several groups of events were added:

COMING SOON

system.metrics:

COMING SOON


Also, please refer to the interim release notes from the development team available at the following URLs:

23.8.16.41

Released 2024-09-25

Build report - 23.8.16.41 | Release page

Changes compared to upstream's v23.8.16.40-lts (e143a90)

Bug fixes

Performance improvements

Improvement

Build/Testing/Packaging Improvement

  • Make builds possible on Altinity’s infrastructure
  • clickhouse-regression test suite (#338 by @MyroTk)
  • Changed “version is not official” log message on failure (#369 by @Enmk)
  • Fix kerberized_hadoop docker image build issue (#394 by @ilejn)
  • Reuse unchanged images #455 by @MyroTk
  • Pinned version of pip packet requests==2.31.0 in attempt to fix: ‘Error while fetching server API version: Not supported URL scheme http+docker’ (#409 by @Enmk)
  • Aarch64 builds with better introspection (-no-pie) (#396 by @Enmk)
  • Fixed the docker tag for docker server test image (#463 by @MyroTk)
  • Add SignAarch64 job to match SignRelease (#462 by @MyroTk)
  • Fix test_distributed_directory_monitor_split_batch_on_failure_OFF (#464 by @Enmk)
  • Fix kererberized_hadoop tests (#468, #459 by @ilejn)
  • Updated reference to clickhouse-regression suite (#467 by @MyroTk)
  • Scheduled run + CI/CD fixups (#402 by @MyroTk)

upstream ClickHouse changes between v23.8.16.40-lts (e143a90) and v23.8.11.28-lts (31879d2)

Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC))

Bug Fix (user-visible misbehavior in an official stable release))

  • The query cache now denies access to entries when the user is re-created or assumes another role. This improves prevents attacks where 1. a user with the same name as a dropped user may access the old user’s cache entries or 2. a user with a different role may access cache entries of a role with a different row policy. (#58611 by Robert Schulze via #61439)
  • Fix string search with constant start position which previously could lead to memory corruption. (#61547 by Antonio Andelic via #61572)
  • Fix crash in multiSearchAllPositionsCaseInsensitiveUTF8 when specifying incorrect UTF-8 sequence. Example: #61714. (#61749 by pufit via #61854)
  • Fixed a bug in zero-copy replication (an experimental feature) that could cause The specified key does not exist errors and data loss after REPLACE/MOVE PARTITION. A similar issue might happen with TTL-moves between disks. (#54193 by Alexander Tokmakov via #62898)
  • Fix the ATTACH query with the ON CLUSTER clause when the database does not exist on the initiator node. Closes #55009. (#61365 by Nikolay Degterinsky via #61964)
  • Fix data race between MOVE PARTITION query and merges resulting in intersecting parts. (#61610 by János Benjamin Antal via #62527)
  • Fix skipping escape sequence parsing errors during JSON data parsing while using input_format_allow_errors_num/ratio settings. (#61883 by Kruglov Pavel via #62238)
  • Fix segmentation fault when using Hive table engine. Reference #62154, #62560. (#62578 by Nikolay Degterinsky via #62673)
  • Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR. (#63122 by Alexander Gololobov via #63172)
  • The wire protocol version check for MongoDB used to try accessing “config” database, but this can fail if the user doesn’t have permissions for it. The fix is to use the database name provided by user. (#63126 by Alexander Gololobov via #63164)
  • Fixes #59989: runs init scripts when force-enabled or when no database exists, rather than the inverse. (#59991 by jktng via #64422)
  • Fix “Invalid storage definition in metadata file” for parameterized views. (#60708 by Azat Khuzhin via #64016)
  • Fix the issue where the function addDays (and similar functions) reports an error when the first parameter is DateTime64. (#61561 by Shuai li via #63456)
  • Fix crash with untuple and unresolved lambda. (#63131 by Raúl Marín via #63289)
  • Fix X-ClickHouse-Timezone header returning wrong timezone when using session_timezone as query level setting. (#63377 by Andrey Zvonov via #63512)
  • query_plan_remove_redundant_distinct can break queries with WINDOW FUNCTIONS (with allow_experimental_analyzer is on). Fixes #62820. (#63776 by Igor Nikonov via #63902)
  • Deserialize untrusted binary inputs in a safer way. (#64024 by Robert Schulze via #64104)
  • Prevent LOGICAL_ERROR on CREATE TABLE as MaterializedView. (#64174 by Raúl Marín via #64265)
  • Fixed memory possible incorrect memory tracking in several kinds of queries: queries that read any data from S3, queries via http protocol, asynchronous inserts. (#64844 by Anton Popov via #64867)
  • Fix possible abort on uncaught exception in ~WriteBufferFromFileDescriptor in StatusFile. (#64206 by Kruglov Pavel via #65351)
  • Fix crash on destroying AccessControl: add explicit shutdown. (#64993 by Vitaly Baranov via #66037)
  • Fixed bug in MergeJoin. Column in sparse serialisation might be treated as a column of its nested type though the required conversion wasn’t performed. (#65632 by Nikita Taranov via #65782)
  • For queries that read from PostgreSQL, cancel the internal PostgreSQL query if the ClickHouse query is finished. Otherwise, ClickHouse query cannot be canceled until the internal PostgreSQL query is finished. (#65771 by Maksim Kita via #65926)
  • Fix a bug in short circuit logic when old analyzer and dictGetOrDefault is used. (#65802 by jsc0218 via #65822)
  • Fixed a bug in ZooKeeper client: a session could get stuck in unusable state after receiving a hardware error from ZooKeeper. For example, this might happen due to “soft memory limit” in ClickHouse Keeper. (#66140 by Alexander Tokmakov via #66449)
  • Correctly track memory for Allocator::realloc. (#66548 by Antonio Andelic via #66717)
  • Fix bug with session closing in Keeper. (#65735 by Antonio Andelic via #65913)
  • Fix data race in S3::ClientCache. (#66644 by Konstantin Morozov via #66853)

Improvement

Build/Testing/Packaging Improvement

CI Fix or Improvement (changelog entry is not required))

NO CL CATEGORY

NOT FOR CHANGELOG / INSIGNIFICANT

Packages

Available for both AMD64 and Aarch64 from https://builds.altinity.cloud/ as either .deb, .rpm, or .tgz

Docker images

Available for both AMD64 and Aarch64: altinity/clickhouse-server/23.8.16.41.altinitystable

23.8.11.29

Released 2024-04-30

Build report - 23.8.11.29 | Release page

Changes compared to upstream's v23.8.11.28-lts

Bug fixes

  • Fixes partition pruning for extra columns in a set. (#55172 by @amosbird via #342)
  • Fixed rare bug when we produced invalid read ranges for queries with FINAL. Resulted in Cannot read out of marks range exception. (#54934 by @nickitat via #343)
  • Fix incorrect free space accounting for least_used JBOD policy (#56030 by @azat via #344)
  • Fix ALTER COLUMN with ALIAS that previously threw the NO_SUCH_COLUMN_IN_TABLE exception. (#57395 by @evillique via #345)
  • Disable system.kafka_consumers by default (due to possible live memory leak) (#57822 by @azat via #346)

Performance improvements

  • Fixed filtering by IN(...) condition for Merge table engine. (#54905 by @nickitat via #341)

Build/Testing/Packaging Improvements

  • Make builds possible on Altinity’s infrastructure
  • clickhouse-regression test suite (#338 by @MyroTk)
  • Remove hostile “non-official” log message on failure. (#369 by @Enmk)
  • Fix kerberized_hadoop docker image build issue (#394 by @ilejn)

Improvements

  • Support S3 access through AWS Private Link Interface endpoints. (#62208 by @arthurpassos via #389)
  • Fixed accounting of memory allocated before attaching thread to a query or a user (#56089 by @nickitat via #388)

Upstream ClickHouse changes between v23.8.11.28-lts (31879d2) and v23.8.8.20-lts (5e012a0)

Improvements

  • Output valid JSON/XML on exception during HTTP query execution. Add setting http_write_exception_in_output_format to enable/disable this behaviour (enabled by default). #52853
  • Fix transfer query to MySQL compatible query. Fixes #57253. Fixes #52654. Fixes #56729. #56456
  • Fetching a part waits when that part is fully committed on remote replica. It is better not send part in PreActive state. In case of zero copy this is mandatory restriction. #56808
  • Handle sigabrt case when getting PostgreSQl table structure with empty array. #57618
  • Add SYSTEM JEMALLOC PURGE for purging unused jemalloc pages, SYSTEM JEMALLOC [ ENABLE | DISABLE | FLUSH ] PROFILE for controlling jemalloc profile if the profiler is enabled. Add jemalloc-related 4LW command in Keeper: jmst for dumping jemalloc stats, jmfp, jmep, jmdp for controlling jemalloc profile if the profiler is enabled. #58665
  • Copy S3 file GCP fallback to buffer copy in case GCP returned Internal Error with GATEWAY_TIMEOUT HTTP error code. #60164
  • Update tzdata to 2024a. #60768

Build/Testing/Packaging Improvements

Bug Fixes (user-visible misbehavior in an official stable release)

  • Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) #56132
  • Fix ALTER COLUMN with ALIAS #56493
  • Prevent incompatible ALTER of projection columns #56948
  • Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column #57147
  • Fix incorrect JOIN plan optimization with partially materialized normal projection #57196
  • Fix ReadonlyReplica metric for all cases #57267
  • Fix working with read buffers in StreamingFormatExecutor #57438
  • bugfix: correctly parse SYSTEM STOP LISTEN TCP SECURE #57483
  • Ignore ON CLUSTER clause in grant/revoke queries for management of replicated access entities. #57538
  • Disable system.kafka_consumers by default (due to possible live memory leak) #57822
  • Fix invalid memory access in BLAKE3 (Rust) #57876
  • Normalize function names in CREATE INDEX #57906
  • Fix invalid preprocessing on Keeper #58069
  • Fix Integer overflow in Poco::UTF32Encoding #58073
  • Remove parallel parsing for JSONCompactEachRow #58181
  • Fix parallel parsing for JSONCompactEachRow #58250
  • Background merges correctly use temporary data storage in the cache #57275
  • MergeTree mutations reuse source part index granularity #57352
  • Fix double destroy call on exception throw in addBatchLookupTable8 #58745
  • Fix JSONExtract function for LowCardinality(Nullable) columns #58808
  • Fix: LIMIT BY and LIMIT in distributed query #59153
  • Fix translate() with FixedString input #59356
  • Fix error “Read beyond last offset” for AsynchronousBoundedReadBuffer #59630
  • Fix query start time on non initial queries #59662
  • Fix leftPad / rightPad function with FixedString input #59739
  • rabbitmq: fix having neither acked nor nacked messages #59775
  • Fix cosineDistance crash with Nullable #60150
  • Fix buffer overflow in CompressionCodecMultiple #60731
  • Remove nonsense from SQL/JSON #60738
  • Fix crash in arrayEnumerateRanked #60764
  • Fix crash when using input() in INSERT SELECT JOIN #60765
  • Remove recursion when reading from S3 #60849

NO CL ENTRY

  • Update PeekableWriteBuffer.cpp. #57701
  • Use the current branch test-utils to build cctools. #61276

NOT FOR CHANGELOG / INSIGNIFICANT

  • Pin alpine version of integration tests helper container #57669
  • Remove heavy rust stable toolchain #57905
  • Fix docker image for integration tests (fixes CI) #57952
  • Fix rare race in external sort/aggregation with temporary data in cache #58013
  • Fix possible race in ManyAggregatedData dtor. #58624
  • Fix 02720_row_policy_column_with_dots #59453
  • Pin python dependencies in stateless tests #59663
  • Make ZooKeeper actually sequentially consistent #59735
  • Remove broken test while we fix it #60547
  • Cancel PipelineExecutor properly in case of exception in spawnThreads #57104
  • Detect io_uring in tests #60373
  • Cancel PipelineExecutor properly in case of exception in spawnThreads #60499

Complete 23.8.11.29 Altinity Stable release notes are available in GitHub.