Altinity Stable Build for ClickHouse 23.8

Here are the detailed release notes and upgrade instructions for version 23.8.8. There are a lot of new features, but also many things that changed their behavior. Read these carefully before upgrading.

Major new features in 23.8 since the previous Altinity Stable release 23.3

Released 2023-12-27

A new release introduces a lot of changes and new functions. It is very hard to pick the most essential ones, so refer to the full list in the Appendix. The following major features are worth mentioning on the front page:

  • Few features that were experimental are now graduated as production ready:

    • Projections #49719
    • Query Results cache #47977. See our Caching Guide for examples.
    • Geographical data types: Point, Ring, Polygon, and MultiPolygon #50022
  • SQL features:

    • Overlay databases for external data:

    CREATE DATABASE files ENGINE = Filesystem('subdir/') CREATE DATABASE bucket ENGINE = S3('https://clickhouse-public-datasets.s3.amazonaws.com/')

    • SELECT ... INTO OUTFILE '...' APPEND / TRUNCATE #48880

    • PRQL query language #50686

    • Plus and minus operations for arrays: SELECT [1, 2, 3] + [4, 5, 6] #52625. Multiplication and division comes in later releases.

    • Direct import from archives (files only):

      SELECT * FROM file('path/to/archive.zip :: path/inside/archive.csv')
      
  • MergeTree improvements:

    • There were a lot of improvements in order to reduce memory usage for merges, loading parts. FINAL, index analysis and others.
    • Added setting async_insert for MergeTree tables. It has the same meaning as query-level setting async_insert and enables asynchronous inserts for a specific table. #49122.
  • Replication improvements:

    • Improve insert retries on keeper session expiration. #52688.
    • Reduced number ZooKeeper requests when selecting parts to merge and a lot of partitions do not have anything to merge. #49637. – in general, users would observe lower ZooKeeper load when upgrading from 23.3 to 23.8
  • Security & access control

    • ROW POLICY can be applied to all tables in a database. #47640. a)
    • Added server-side encryption using KMS keys with S3 tables, and the header setting with S3 disks. #48724.
    • Add server and format settings display_secrets_in_show_and_select for displaying secrets of tables, databases, table functions, and dictionaries. Add privilege displaySecretsInShowAndSelect controlling which users can view secrets. #46528.
    • Adding the grants field in the users.xml file, which allows specifying grants for users. #49381.
    • GRANT CURRENT GRANTS
  • Object storage

    • Support proxy for S3 access a)read our article for more detail
    • Native support for GCP via gcs table function
    • Native support for Azure via azureBlobStorage and azureBlobStorageCluster table functions
  • Operational:

    • New tables for introspection:
      • system.zookeeper_connection
      • system.kafka_consumers
    • Server and query level IO throttling for different operations (#48242):
      • server settings: max_remote_read_network_bandwidth_for_server, max_remote_write_network_bandwidth_for_server, max_local_read_bandwidth_for_server, max_local_write_bandwidth_for_server, max_backup_bandwidth_for_server
      • profile settings: max_remote_read_network_bandwidth, max_remote_write_network_bandwidth, max_local_read_bandwidth, max_local_write_bandwidth, max_backup_bandwidth.

As usual with ClickHouse, there are many performance and operational improvements in different server components.


a) Contributed by Altinity developers.

Major changes that require attention

  • use_environment_credentials is enabled by default since 23.4. That blocks access to anonymous S3 buckets. The recommended way accessing public buckets is NOSIGN keyword supported for both S3 table function and engine.
  • Compression of mark files and primary keys is enabled by default. It produces different binary representations of those files. This is controlled by compress_marks and compress_primary_key merge tree settings.
    • It is not possible to downgrade from 23.8 to 22.8 or earlier
    • If you upgrade from versions prior to 22.9, you should either upgrade all replicas at once or disable the compression before upgrade, or upgrade through an intermediate version, where the compressed marks are supported but not enabled by default, such as 23.3.
  • sparse columns enabled by default. It produces different binary representation, so if used downgrade to versions older than 22.1 might not be possible.. This can be disabled by ratio_of_defaults_for_sparse_serialization=1.
  • max_concurrent_queries default value has been increased from 100 to 1000. It may be too high in same cases, max_concurrent_queries * max_threads should be lower than ~8000.
  • Maximum number of parts per partition restriction (“too many parts” error) has been relaxed:
    • parts_to_delay_insert=1000 (was: 150)
    • parts_to_throw_insert=3000 (was: 300)
    • There is a new setting max_avg_part_size_for_too_many_parts=1073741824, so ‘parts_to_throw’ restriction is applied only for partitions of 1TB size or above. This may result in slower performance, so changing ‘parts_to_throw’ to smaller values is recommended.
  • Metadata cache has been removed
  • The new feature system.kafka_consumers enables collecting librdkafka statistics every 3 seconds by default. If you have Kafka tables which you don’t read from, that can lead to collecting those unprocessed stat messages in memory. Collection is disabled in Altinity.Stable Build 23.8.8 and upstream build 23.8.9. You may turn it on and off using server level Kafka setting:
<kafka>
    <statistics_interval_ms> <!-- Set non-zero value to enable -->
    0
    </statistics_interval_ms>
</kafka>

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • allow_experimental_query_cache profile setting has been removed (the feature is production ready) #52685. ClickHouse will not start if it is defined, so please removed it before an upgrade
  • Several features have been removed:
    • hashid #52449
    • do_not_evict_index_and_mark_files #51253
    • ALTER support for experimental LIVE VIEW #51287
    • in-memory data parts #49429
    • metadata cache #51303
  • The microseconds column is removed from the system.text_log, and from the system.metric_log, because they are redundant in the presence of the event_time_microseconds column. #53601

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • Upgrading from 20.3 and older to 22.9 and newer should be done through an intermediate version if there are any ReplicatedMergeTree tables; otherwise, the server with the new version will not start. #40641. Here is a possible upgrade path if you are upgrading from 20.3: 20.3 -> 22.8 -> 23.3 -> 23.8
  • If you upgrade from versions prior to 22.9, you should either upgrade all replicas at once or disable compress_marks and compress_primary_key merge tree settings before upgrade, or upgrade through an intermediate version, where the compressed marks are supported but not enabled by default, such as 23.3.
  • Downgrading from 23.8 to version 23.5 or below may fail due to changes in sparse columns serialization, see #55153 for possible workaround.

Known Issues in 23.8.x

The development team continues to improve the quality of the 23.8 release. The following issues still exist in the 23.8.8 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications.

General stability issues:

Some queries that worked in 23.3 may fail in 23.8:

There are a few performance regressions:

You may also look into a GitHub issues using v23.8-affected label.​​

Other Important Changes

Several settings have changed their defaults:

  • distributed_ddl_entry_format_version=5 (was: 3) – enables open telemetry and initial_query_id pass through. #52128
    • downgrading to versions with unprocessed DDL entries can lead to errors
  • connect_timeout_with_failover_ms=1000 (was: 50)
  • connect_timeout_with_failover_secure_ms=1000 (was: 100)
  • hedged_connection_timeout_ms=50 (was: 100)
  • http_max_field_name_size=131072 (was: 1048576)
  • http_max_field_value_size=131072 (was: 1048576)
  • http_receive_timeout=30 (was: 180)
  • http_send_timeout=30 (was: 180)

Some new ClickHouse features and optimizations are now enabled by default. It may lead to a change in behavior, so review those carefully and disable features that may affect your system:

  • merge_tree_settings.allow_vertical_merges_from_compact_to_wide_parts
  • async_query_sending_for_remote
  • check_dictionary_primary_key
  • projection metadata now is checked on table attach #52361
  • move_all_conditions_to_prewhere, enable_multiple_prewhere_read_steps, move_primary_key_columns_to_end_of_prewhere
  • enable_memory_bound_merging_of_aggregation_results – if you upgrade from version prior to 22.12, we recommend setting this flag to false until the upgrade is finished.
  • optimize_distinct_in_order
  • optimize_count_from_files
  • optimize_use_projections, optimize_use_implicit_projections

system.query_views_log now collects information about MVs that are pushed from background threads, like Kafka / Rabbit etc. #46668. You can have more (useful!) data there after the upgrade.

LZ4 & ZSTD compression libraries were upgraded, checksum mismatch when having replicas with older versions are possible

In the previous releases we recommended disabling optimize_on_insert. This recommendation stays for 23.8 as well as inserts into Summing and AggregatingMergeTree can slow down.

Changes in Altinity Stable build Compared to Upstream Build

ClickHouse Altinity Stable builds are open source and are based on the upstream LTS versions. Altinity.Stable 23.8.8 is based on upstream 23.8.8.20-lts, but we have additionally backported several fixes:

  • Fix key analysis (with set) for Merge engine #54905 via #341
  • Fix partition pruning of extra columns in set #55172 via #342
  • Fix FINAL produces invalid read ranges in a rare case #54934 via #343
  • Fix incorrect free space accounting for least_used JBOD policy #56030 via #344
  • Fix ALTER COLUMN with ALIAS #56493 via #345 (backported into 23.8.9 upstream)
  • Disable system.kafka_consumers by default due to possible live memory leak #57822 via #346 (backported into 23.8.9 upstream)

Let’s Install!

Linux packages can be found at https://packages.clickhouse.com for upstream builds, and at https://builds.altinity.cloud for Altinity Stable builds.

Note: naming schema for Altinity.Stable build packages has been changed since 21.8.x.

21.8.x 22.3.x and later versions
<package>_<ver>.altinitystable_all.deb <package>_<ver>.altinitystable_amd64.deb
<package>-<ver>.altinitystable-2.noarch.rpm <package>-<ver>.altinitystable.x86_64.rpm

Docker images for the upstream version should be referenced as clickhouse/clickhouse-server:23.8.8.20

Altinity Stable build images are available as altinity/clickhouse-server:23.8.8.21.altinitystable.

For more information on installing ClickHouse from either the Altinity Builds or the upstream builds, see the ClickHouse Altinity Stable Release Build Install Guide.

Please contact us at info@altinity.com if you experience any issues with the upgrade.


Appendix

New table functions

  • azureBlobStorage, azureBlobStorageCluster
  • gcs
  • redis
  • urlCluster

New table engines

  • AzureBlobStorage
  • Redis
  • S3Queue

New functions

COMING SOON

New formats

  • One – doesn’t read any data and always returns a single row with column dummy with type UInt8 and value 0 like system.one. It can be used together with _file/_path virtual columns to list files in file/s3/url/hdfs/etc table functions without reading any data
  • ParquetMetadata – providers introspection into Parquet files, see this excellent blog article for detail
  • PrettyJSONEachRow
  • PrettyJSONLines
  • PrettyNDJSON
  • RowBinaryWithDefaults

New system tables

  • system.jemalloc_bins
  • system.kafka_consumers
  • system.user_processes – query statistics grouped by user
  • merge_tree_metadata_cache was removed

New columns in system tables

  • system.clusters: database_shard_name, database_replica_name, is_active, name (alias to cluster)
  • system.filesystem_cache: cache_name, key, file_size
  • system.functions: syntax, arguments, returned_value, examples, categories – those are populated for new functions only, but eventually may be filled in for all
  • system.merge_tree_settings/replicated_merge_tree_settings/settings/server_settings: is_obsolete
  • system.merges: partitions, total_size_uncompressed
  • system.metrics: name (alias to metric)
  • system.parts: primary_key_size, part_name (alias to name)
  • system.parts_columns: column_modification_time, part_name (alias to name)
  • system.projection_parts: part_name (alias to name)
  • system.projection_parts_columns: column_modification_time, part_name (alias to name)
  • system.query_cache: compressed
  • system.replicas: zookeeper_name, lost_part_count
  • system.storage_policies: perform_ttl_on_insert, load_balancing

New metrics and events

system.events:

Several groups of events were added:

COMING SOON

system.metrics:

COMING SOON


Also, please refer to the interim release notes from the development team available at the following URLs: