Altinity Stable® Build for ClickHouse® 24.3

Here are the detailed release notes and upgrade instructions for version 24.3.5.47. There are a lot of new features, but also many things that changed their behavior. Read these carefully before upgrading.

Major new features in 24.3 since the previous Altinity Stable release 23.8

Released 2024-07-23

A new release introduces a lot of changes and new functions. Please refer to the full list in the following sections. The following major features are worth mentioning on the front page:

  1. Analyzer is enabled by default:
    • Many complex queries start to work properly.
      • Brings consistency and completeness of features.
        • Multiple ARRAY JOIN in a single query.
        • SAMPLE can be specified for any table expression in JOIN.
        • FINAL can be specified for any table expression in JOIN.
        • Up to 5x faster for complex queries with joins.
      • As of version 24.3, the analyzer does not support the following experimental features: window views, annoy and usearch indices, and hypothesis constraints.

If you experience failed queries that worked in earlier versions, try disabling the analyzer with the allow_experimental_analyzer=0 setting.

  1. Refreshable Materialized Views: Run the SELECT query in the background and atomically replace the table with its result #56946.
  2. Features that are not marked as experimental anymore:
  3. Variant Data Type: See allow_experimental_variant_type & #58047.

Major Changes that Require Attention

  1. Analyzer is enabled by default

    • If you experience failed queries that used to work in earlier versions, try disabling the analyzer with allow_experimental_analyzer=0 setting.
  2. Server settings changed their defaults

    • max_concurrent_queries is now 1000 (previously 100). We highly recommend keeping the old value if it worked well before, as the new default can impact server responsiveness and stability if the limit was often reached.
    • concurrent_threads_soft_limit_ratio_to_cores is now 2. While the new value is generally better, some queries may suffer from reduced parallelism.
    • background_schedule_pool_size increased from 128 to 512. This new value is generally better, especially when the number of replicated tables is high. But you can observe more thread working in the system.
    • background_fetches_pool_size increased from 8 to 16. The new default can work better, especially for scenarios with a high number of replicated tables and inserts. However, it can create more pressure on the network card if it has poor bandwidth.
    • index_mark_cache_size is now enabled and set to 5GB. This change greatly impacts query speed performance if you use skipping indexes, but it may increase ClickHouse memory usage.
  3. MergeTree settings changed their defaults

    • replicated_deduplication_window increased from 100 to 1000. This may lead to some growth in the Zookeeper data size, especially if there are many replicated tables. You can keep the old default.

Backward Incompatible Changes

  • avgWeighted aggregate function no longer supports Decimal type arguments. Convert them to Float.
  • IPv6 bloom filter indexes created prior to March 2023 are incompatible with the current version and must be rebuilt.
  • Support for experimental Meilisearch was removed due to protocol changes.
  • Non-deterministic functions in TTL expressions are now forbidden by default (see allow_suspicious_ttl_expressions = 1).
  • ReplacingMergeTree with is_deleted flag: clean_deleted_rows is deprecated. The CLEANUP keyword for OPTIMIZE is not allowed by default (unless allow_experimental_replacing_merge_with_cleanup is enabled).
  • Support for in-memory data parts was fully dropped. If you have used this before, you may need to perform extra steps before the upgrade (check #61127).
  • extract_kvp_max_pairs_per_row renamed to extract_key_value_pairs_max_pairs_per_row.
  • The order of arguments for the locate function changed to match MySQL (see function_locate_has_mysql_compatible_argument_order = 0).
  • SimpleAggregateFunction is now forbidden in the table ORDER BY (see allow_suspicious_primary_key).
  • geoDistance, greatCircleDistance, and greatCircleAngle now return Float64 instead of Float32 (see geo_distance_returns_float64_on_float64_arguments).
  • query_cache_store_results_of_queries_with_nondeterministic_functions marked obsolete and replaced by query_cache_nondeterministic_function_handling #56519.

Upgrade Notes

It is always a good idea to test a new release in a dedicated environment, or at least to carefully check the list of backward incompatibilities before installing the new release.

For a rolling upgrade, consider disabling the new analyzer (set allow_experimental_analyzer=0) to prevent distributed queries from failing .

Known Issues in 24.3.5

The following problems are known to exist in 24.3.5:

A lot of issues have been discovered in the new analyzer after it was enabled by default. Many of them have been fixed, but some are still open:

https://github.com/ClickHouse/ClickHouse/issues?q=is%3Aopen+is%3Aissue+label%3Aanalyzer

If you experience failed queries that used to work in earlier versions, try disabling the new analyzer with the allow_experimental_analyzer=0 setting.

Other Important Changes

  • The default user now has access_management (user manipulation by SQL queries) and named_collection_control (manipulation of named collections by SQL queries) settings enabled by default. The new values are more user-friendly, but consider the security effects.
  • Some safety limits have been introduced (see also Altinity KB):
    • max_projections in merge_tree_settings - default 25
    • max_database_num_to_warn in server_settings - default 1000
    • max_materialized_views_count_for_table in server_settings - default 0 (i.e., disabled)
  • Utility clickhouse-copier has been moved to a separate repository on GitHub: ClickHouse/copier. It is no longer included in the bundle but is still available as a separate download.
  • output_format_orc_compression_method and output_format_parquet_compression_method changed from lz4 to zstd #61817.
  • output_format_orc_string_as_string, output_format_parquet_string_as_string, and output_format_arrow_string_as_string are enabled #61817.
  • input_format_parquet_allow_missing_columns, input_format_orc_allow_missing_columns, and input_format_arrow_allow_missing_columns are enabled.
  • log_processors_profiles can create some extra data in system.query_log.
  • Prefetch settings changed:
    • filesystem_prefetches_limit: 0 => 200
    • filesystem_prefetch_min_bytes_for_single_read_task: 8388608 => 2097152
    • allow_prefetched_read_pool_for_remote_filesystem: 0 => 1
  • load_metadata_threads: 1 => 16. This will make the server start up faster.
  • max_execution_time is now split into two settings: max_estimated_execution_time and max_execution_time #58402.
  • The Ordinary database engine is deprecated. You will see a warning in clickhouse-client if your server is using it.
  • directory_monitor settings were renamed to distributed_background_insert, but old settings will continue to work (exist as aliases).

Let’s Install!

Linux packages can be found at https://packages.clickhouse.com for upstream builds, and at https://builds.altinity.cloud for Altinity Stable builds.

Docker images for the upstream version should be referenced as clickhouse/clickhouse-server:24.3.5.46.

Altinity Stable build images are available as altinity/clickhouse-server:24.3.5.47.altinitystable.

For more information on installing ClickHouse from either the Altinity Builds or the upstream builds, see the ClickHouse Altinity Stable Builds Install Guide.

Please contact us at info@altinity.com if you experience any issues with the upgrade.

Appendix - New Features

New Syntax

  • DROP TABLE IF EMPTY
  • Negative Positional Arguments (ORDER BY -1 DESC)
  • ORDER BY * - Sort by all selected columns.
  • SHOW FUNCTIONS / SHOW MERGES / SHOW SETTING / SHOW SETTING max_block_size
  • ALTER TABLE test ADD INDEX ix (col) TYPE minmax, MATERIALIZE INDEX ix - can be done in one query now.
  • ATTACH PARTITION ALL
  • CHECK ALL TABLES
  • PASTE JOIN - join two tables by position-wise (1st row on the left with the first row on the right, etc.).
  • ALTER TABLE {name:Identifier} ATTACH PARTITION {x:String} (now accept query parameters).
  • Plural form in intervals, i.e., INTERVAL 2 HOURS = INTERVAL 2 HOUR.
  1. ORDER BY mortonEncode(x, y) can now properly work (analyze_index_with_space_filling_curves=1) #55642.

Security Features

  • Definer for View/Materialized View: This allows executing selects/inserts from views without explicit grants for underlying tables.
  • Temporary User Credentials: IDENTIFIED BY 'foo' VALID UNTIL '2025-01-01'
  • SSH Keys for Authentication.
  • SSO for AWS S3 according to AWS_PROFILE #54347.
  • External HTTP Basic authenticator #55199.
  • Safer named collections. It is now possible to define which fields can not be overwritten in function calls #55782.
  • primary_key_lazy_load=1 enabled by default; loads the primary key on first access. It speeds up server startup #60093.
  • Memory usage for primary key is reduced. Avoid loading in RAM the columns after a cardinal one (primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns=0.9) #60255.
  • Long column names can be used now. In part Avoid ‘File name too long,’ especially for projections (replace_long_file_name_to_hash = 1).
  • Automatic conversion of merge tree tables of different kinds to replicated engine #57798.
  • Columns statistics (allow_statistic_optimize = 1) to order prewhere conditions better #53240.
  • Projections: SET force_optimize_projection_name = 'foo', preferred_optimize_projection_name = 'bar'.
  • Table function mergeTreeIndex to inspect indexes.
  • New virtual columns _part_offset and _block_number.
  • Indices on ALIAS columns.
  • Optimizations for FINAL.

Mutations

  • ATTACH PARTITION from a different disk: Similar to MOVE PARTITION TO DISK/VOLUME, but works between tables.
  • apply_deleted_mask=0: Allows seeing the rows deleted by light-weight delete.
  • APPLY DELETED MASK: Remove deleted records without OPTIMIZE query.
  • Exponential backoff logic for mutation retries #58036.

Compression Codecs

  • GCD Codec “greatest common denominator” #53149.
  • min_compress_block_size and max_compress_block_size can now be specified at the column level CREATE TABLE ... (col String SETTINGS (min_compress_block_size = 81920, max_compress_block_size = 163840). #55201.
  • Support for LZ4HC(2) (previously worked as LZ4HC(3)).
  • ZSTD_QAT codec for Intel QuickAssist Technology hardware acceleration.

Aggregate Functions

  • groupArraySorted(n)(value): Useful for “top N” queries without full sorting.
  • quantileDD (+ quantilesDD, medianDD): Based on the DDSketch #56342.
  • groupArrayIntersect.
  • approx_top_count, approx_top_k, approx_top_sum: Similar to topK/topKWeighed functions but include count/error statistics #54508.
  • ArgMin, ArgMax as combinators: Apply aggregate function to the set of values where another value is the maximum in a group #54947.
  • Settings: anyLast_respect_nulls, any_respect_nulls, any_value_respect_nulls.

Timeseries Functions

  • largestTriangleThreeBuckets (Or lttb) lttb(n)(x, y): Downsampling time-series–like data while retaining the overall shape and variability in the data.
  • seriesPeriodDetectFFT: Find the main frequency of a signal/series period.
  • seriesDecomposeSTL(): Decomposes a time series into a season, a trend, and a residual component #57078 #58961.
  • seriesOutliersDetectTukey: Detect outliers in series data using Tukey’s Fences algorithm #58632.

Array Functions

  • arrayFold: SELECT arrayFold((acc, value...) -> new_acc, initial_acc, arr...)
  • arrayRandomSample(range(1000), 10).
  • Multiply/divide on scalar #54608.
  • arrayShingles: ['ClickHouse','is','a','good','database'] -> arrayShingles(tok, 3) -> [['ClickHouse','is','a'],['is','a','good'],['a','good','database']].

Other Functions

  • SHA-512/256.
  • formatQuery, formatQuerySingleLine.
  • byteSwap.
  • SQID: Short Unique Identifiers from Numbers SELECT sqid(123456789); rDOOFW.

Strings / JSON Functions

  • jsonMergePatch.
  • Comparing strings: levenshteinDistance, damerauLevenshteinDistance, jaroSimilarity, jaroWinklerSimilarity.
  • concat & format with arbitrary types (without casting to Strings).
  • punycodeEncode, punycodeDecode, idnaEncode, idnaDecode.
  • decodeHTMLComponent.

Other

  • Non-Constant Time Zones.
  • Size-Capped Memory Tables: ENGINE = Memory SETTINGS max_bytes_to_keep = '1G' (also max_bytes_to_keep, max_rows_to_keep, min_bytes_to_keep, min_rows_to_keep).
  • Configurable dashboards: Queries for charts are now loaded using a query, which by default uses a new system.dashboards table #56771.
  • Options partial_result_update_duration_ms and max_rows_in_partial_result to show updates in real-time during query execution #48607.
  • generate_series table function.
  • Many improvements in parallel replicas functionality.
  • Support for read-only connection to ZooKeeper server #57479.
  • hostname column in all system log tables #55894.
  • Several improvements in (experimental) s3 zero copy feature.

Distributed Queries

  • distributed_insert_skip_read_only_replicas: Skip read-only replicas for INSERT into Distributed engine.
  • Settings for the Distributed table engine can now be specified in the server configuration file (similar to MergeTree settings), e.g., <distributed> <flush_on_detach>false</flush_on_detach> </distributed>.
  • Ability to override initial INSERT settings via SYSTEM FLUSH DISTRIBUTED #61832.

HTTP Protocol

  • Speed up HTTP output.
  • Separate metrics of network traffic for each server interface: InterfaceHTTPSendBytes, InterfaceHTTPReceiveBytes, InterfaceNativeSendBytes.
  • getClientHTTPHeader function.
  • Refactoring of the code around HTTP/HTTPS connections (+ introducing some limits), making HTTP connections reusable #58845.
  • Allow disabling of HEAD request before GET request #54602.

Operational / Maintenance

  • Allow overwriting max_partition_size_to_drop and max_table_size_to_drop server settings in query time #57452.
  • Asynchronous loading of tables: async_load_databases for asynchronous loading of databases and tables. See also system.async_loader, max_waiting_queries #49351 #61053.
  • Adding new disk to storage configuration without restart.
  • alter_move_to_space_execute_async allow to start moves in the background
  • max_mutations_bandwidth_for_server & max_merges_bandwidth_for_server #57877.
  • IO Scheduling support for remote disks #47009 #54618.
  • Composable configurations of virtual filesystems on top of object storages and metadata storages.
  • ALTER TABLE table FORGET PARTITION partition: Clean up ZooKeeper nodes related to a partition which will not be used.
  • volume_priority in storage_configuration.
  • Allow loading AZ info from a file #59976.

S3 / Object Storage Improvements

  • Prefetches enabled by default #53709.
  • Adaptive timeouts #56314.
  • key_template option to adjust data layout inside the bucket.
  • Introspection table system.blob_storage_log.
  • S3 Express One Zone support.

SQL & MySQL Compatibility

  • MySQL Binlog Client for MaterializedMySQL: One binlog connection for many databases #57323.
  • Allow skipping engine (default_table_engine=MergeTree), Zookeeper path & replicas (default_replica_name, default_replica_path), or ORDER BY (create_table_empty_primary_key_by_default, no index will be used).
  • Default parameters for Decimal: DECIMAL(P), DECIMAL. New functions: TO_DAYS (

toDaysSinceYearZero), addDate, subDate. Function aliases: STD (stddevPop), current_user.

  • information_schema: table_comment, table_collation, data_length, statistics.
  • columns.extra, tables.table_rows, tables.data_length, key_column_usage, referential_constraints (for QuickSight).
  • Minimal support for prepared statements (for Tableau).
  • String arguments for add/subtract date/time and toDayOfWeek.
  • date_trunc supports case-insensitive unit names.
  • Enums and strings are cast to a common type if needed (+ substring supports the Enum data type).
  • Enable group of settings when clients are connected via MySQL protocol (prefer_column_name_to_alias = 1, mysql_map_string_to_text_in_show_columns, and mysql_map_fixed_string_to_text_in_show_columns). Helps BI tools like QuickSight to work.

Formats

  • NumPy as input format: SELECT * FROM 'data.npy'.
  • Autodetect JSON/JSONEachRow.
  • Valid JSON/XML on exceptions for some formats (especially HTTP).
  • ‘Pretty’ formats look better:
    • output_format_pretty_row_numbers=1.
    • output_format_pretty_highlight_digit_groups=1.
    • output_format_pretty_single_large_number_tip_threshold=1000000.
  • Control for compression level: output_format_compression_level = 6, output_format_compression_zstd_window_log = 26.
  • Inline template format (format_template_resultset_format + format_template_row_format).
  • Automatic detection of formats when it can not be inferred from a file extension
  • Type inference for nested JSONs.
  • Support for well-known Protobuf types.

Integrations

  • _size virtual column: In external tables: s3, url, file, hdfs, azureBlobStorage, etc.
  • fileCluster for parallel query across distributed file systems.
  • Table function merge with a single argument (first database argument can be skipped).
  • Support for nats_handle_error_mode / rabbitmq_handle_error_mode / handle_error_mode similar to kafka_handle_error_mode.

Important Performance Improvements

  • Reducing lock contention for high concurrent loads, see ClickHouse Lock Contention
  • Optimizations for queries with FINAL:
    • Avoid merging data across different partitions if the table’s partition key consists of columns from the primary key.
    • Find data parts with non-intersecting ranges of the primary key to avoid merging data from those parts.
  • Condition pushdown for ORC row groups (see input_format_orc_filter_push_down).
  • Faster window functions (see query_plan_preserve_num_streams_after_window_functions = 1) #50771, #39631.
  • Parallel flush for buffer tables: Moving data from engine=buffer to the target table was sequential even if it had several layers.
  • Auto-adjustment for asynchronous insert timeouts #58486.

Keeper

  • Support for compression in Keeper protocol #54957.
  • HTTP endpoint for checking if Keeper is ready to accept traffic #55876.
  • Implementation of reconfig, sync, and exists commands for keeper-client #54201.
  • 4-letter command for yielding/resigning leadership.
  • ClickHouse Keeper reports its running availability zone at /keeper/availability-zone path when running on AWS environment.

Embedded Backup

  • backup_log #53638.
  • External Python tool to view backups #56268.
  • Backup & restore support for AzureBlobStorage #56988.
  • Backups as tar archives #59535.
  • S3 server-side for BACKUP/RESTORE and clickhouse-disks copy.
  • RESTORE TABLE ON CLUSTER preserves UUID #53765.
  • Many other fixes (~50 in total).

Changes in Altinity Stable build Compared to Upstream Build

Altinity Stable builds for ClickHouse are open source and are based on the upstream LTS versions. Altinity.Stable 24.3.5 is based on upstream 24.3.5.46-lts, but we have additionally backported several fixes:

  • Fix the crash loop when restoring from backup is blocked by creating an MV with a definer that hasn’t been restored yet. (#64595 by @pufit)
  • Fix moving partition to itself #62459 (#62524 by @helifu)
  • Fix detection of number of CPUs in containers. In the case when the ‘root’ cgroup was used (i.e. name of cgroup was empty, which is common for containers) ClickHouse was ignoring the CPU limits set for the container. (#66237 by @filimonov)

Also, please refer to the interim release notes from the development team available at the following urls:

24.3.5.48 Altinity FIPS

Released 2024-09-04

This is a FIPS-compatible release of version 24.3.5.

Changes compared to 24.3.5.47.altinitystable

Improvement

  • Building BoringSSL ver fips-20210429 (853ca1ea1168dff08011e5d42d94609cc0ca2e27) according to FIPS-140-2 Security Policy 4407, based on build scripts from Golang version go1.22.5
  • Added FIPS_CLICKHOUSE to system.build_options
  • Modified ClickHosue keeper to use full range of openSSL options (same as ClickHouse does) for Raft connections using ubuntu:22.04 as a base for clickhouse-server docker images instead of ubuntu:20.04

Build/Testing/Packaging Improvement

  • CI/CD for FIPS mode
  • Other fixes for CI/CD

Full Changelog: v24.3.5.47.altinitystable...v24.3.5.48.altinityfips

Packages

Available for AMD64 from builds.altinity.cloud as either .deb, .rpm, or .tgz

Docker images

Available for AMD64: altinity/clickhouse-server/24.3.5.48.altinityfips