Altinity Stable Build for ClickHouse 22.8.13 Release Notes

A few months ago, we certified the ClickHouse 22.3 release. It was delivered together with the Altinity Stable build for ClickHouse. Since then many things have happened to ClickHouse. On the Altinity side we continued to put in features but really focused on build process and testing. We started testing the new ClickHouse LTS release 22.8 as soon as it was out in late August. It took us several months to confirm 22.8 is ready for production use and to make sure upgrades go smoothly. As of 22.8.13 we are confident in certifying 22.3 as an Altinity Stable release.

This release is a significant upgrade since the previous Altinity Stable release. It includes more than 2000 pull requests from 280 contributors. Please look below for detailed release notes.

Major new features in 22.8 since the previous stable release 22.3

A new release introduces a lot of changes and new functions. It is very hard to pick the most essential ones, so refer to the full list in the Appendix. The following major features are worth mentioning on the front page:

  • SQL features:
    • GROUPING SETS
    • Experimental support for SQL DELETE (lightweight deletes) – partially contributed by Altinity
    • Interpolate extension for ORDER BY … WITH FILL
    • Support SELECT ... INTO OUTFILE and STDOUT.
    • INTERVAL support for all kinds of time intervals a)
    • Support for non-constant argument for LIKE, ILIKE and match functions
    • Support expressions with window functions
  • Security features:
    • Base58 encoding/decoding a)
    • Named collections were extended to cover more data sources
  • Replication and Cluster improvements:
    • Support inserts into system.zookeeper table
  • Integrations:
    • ‘mongodb’ table function to insert into and select from MongoDB databases
    • MeiliSearch engine integration (experimental)
    • New ‘direct’ join algorithm to work efficiently with key-value databases
  • Remote file system and object storage features:
    • Proper support of Google GCS for S3 table function a)
    • Write cache for remote file systems
    • Caches management commands: DESCRIBE CACHE, SHOW CACHES, SYSTEM DROP FILESYSTEM CACHE
  • Other:
    • Store metadata cache in RocksDB for faster startup times
    • Add simple chart visualization to the built-in Play interface
    • system.settings_changes table to track changes between versions (incomplete)

As usual with ClickHouse, there are many performance and operational improvements in different server components.


a) - contributed by Altinity developers.

Major changes that require attention

Ordinary database deprecation

The Ordinary database engine and the old storage definition syntax for *MergeTree tables are deprecated. It has following consequences:

  • If the system database has the Ordinary engine it will be automatically converted to Atomic on server startup.

  • default_database_engine setting is deprecated and does nothing. New databases are always created as Atomic ones if the engine is not specified.

  • By default it’s not possible to create new databases with the Ordinary engine. There is a setting to keep old behavior: allow_deprecated_database_ordinary=1.

  • Setting allow_deprecated_syntax_for_merge_tree allows to create MergeTree tables in old syntax, but these settings may be removed in future releases.

  • There is a way to convert existing databases from Ordinary to Atomic. In order to do that, create an empty convert_ordinary_to_atomic file in the flags directory and all Ordinary databases will be converted automatically on the next server start.

Background pools

Configuration of background pools have been moved from profile to server settings. However, due to the ClickHouse bug, pools need to be defined in both config.xml and users.xml.

  • If pools were modified in your ClickHouse cluster, you need to add a section in config.xml.
  • If you need to adjust pools in 22.8, you need to do in two places now

For example:

config.xml
—--------
    <background_pool_size>18</background_pool_size>
    <background_move_pool_size>18</background_move_pool_size>
    <background_schedule_pool_size>18</background_schedule_pool_size>
    <background_fetches_pool_size>18</background_fetches_pool_size>
    <background_distributed_schedule_pool_size>18</background_distributed_schedule_pool_size>
    <background_common_pool_size>18</background_common_pool_size>

And:

users.xml
—--------
<profiles>
    <default>
        <background_pool_size>18</background_pool_size>
        <background_move_pool_size>18</background_move_pool_size>
        <background_schedule_pool_size>18</background_schedule_pool_size>
        <background_fetches_pool_size>18</background_fetches_pool_size>          
        <background_distributed_schedule_pool_size>18</background_distributed_schedule_pool_size>
    </default>
</profiles>

Zero-copy replication for remote file systems

Zero-copy replication was incidentally turned on by default in 22.3. In 22.8 it is turned off. So if you were using Disk S3 on a replicated cluster, you will see an increase in S3 storage, double writes and double merges. This is still an experimental feature, it can be enabled with allow_remote_fs_zero_copy_replication setting.

Changes in s3 multipart upload

Since 22.6 uploads to s3 are executed in more threads and in more aggressive manner, which can create a higher pressure on the network and saturate some s3-compatible storages (e.g. Minio). Better control of that was added only in 22.9.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • Extended range of Date32 and DateTime64 to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with ISO 8601:2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of 1899-01-01 was clamped to 1925-01-01, in the new version it will be clamped to 1900-01-01. It changes the behavior of rounding with toStartOfInterval if you pass INTERVAL 3 QUARTER up to one quarter because the intervals are counted from an implementation-specific point of time.
  • Now, all relevant dictionary sources respect the remote_url_allow_hosts setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL.
  • ClickHouse x86 binaries now require support for AVX instructions, i.e. a CPU not older than Intel Sandy Bridge / AMD Bulldozer, both released in 2011.
  • Make the remote filesystem cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete the old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration.
  • Remove support for octal number literals in SQL. In previous versions they were parsed as Float64.
  • Changes how settings using seconds as type are parsed to support floating point values (for example: max_execution_time=0.5). Infinity or NaN values will throw an exception.
  • Changed format of binary serialization of columns of experimental type Object. New format is more convenient to implement by third-party clients.
  • LIKE patterns with trailing escape symbol (’') are now disallowed (as mandated by the SQL standard).
  • Do not allow SETTINGS after FORMAT for INSERT queries. There is compatibility setting parser_settings_after_format_compact to accept such queries, but it is turned OFF by default.
  • Function yandexConsistentHash (consistent hashing algorithm by Konstantin “kostik” Oblakov) is renamed to kostikConsistentHash. The old name is left as an alias for compatibility. Although this change is backward compatible, we may remove the alias in subsequent releases, that’s why it’s recommended to update the usages of this function in your apps.

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • If you run different ClickHouse versions on a cluster with AArch64 CPU or mix AArch64 and amd64 on a cluster, and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don’t fit in 64 bits, and the size of the result is huge, the data will not be fully aggregated in the result of these queries during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.

Rolling upgrade from 20.4 and older is impossible because the “leader election” mechanism is removed from ReplicatedMergeTree.

Known Issues in 22.8.x

The development team continues to improve the quality of the 22.8 release. The following issues still exist in the 22.8.13 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications.

General stability issues (fixed in Altinity.Stable build 22.8.13):

Some queries that worked in 22.3 may fail in 22.8:

There are a few performance regressions:

You may also look into a GitHub issues using v22.8-affected label.

Other Important Changes

Several settings has changed their defaults:

  • The new default value for local_filesystem_read_method is pread_threadpool. In some cases this can lead to performance degradation (see Degraded performance with local_filesystem_read_method = pread_threadpool) if you notice that 22.8 is reading data slower, set local_filesystem_read_method = pread
  • allow_remote_fs_zero_copy_replication is now 0 (was 1)
  • format_csv_allow_single_quotes is now 0 (was 1) – if it is set to true, allow strings in single quotes.
  • distributed_ddl_entry_format_version is now 3 (was 1)
  • log_query_threads is now 0 (was 1)
  • join_algorithm is now ‘default’ (was ‘hash’) – ‘default’ means ‘hash’ or ‘direct’ if possible, so this change only applies to new ‘direct’ joins.
  • max_download_threads is now 4 – that affects number of parallel threads for URL and S3 table engines and table functions

Some new ClickHouse features and optimizations are now enabled by default. It may lead to a change in behavior, so review those carefully and disable features that may affect your system:

  • allow_experimental_geo_types – allow geo data types such as Point, Ring, Polygon, MultiPolygon
  • output_format_json_named_tuples_as_objects – it allows to serialize named tuples as JSON objects in JSON formats.
  • collect_hash_table_stats_during_aggregation – enable collecting hash table statistics to optimize memory allocation, this may lead to performance degradation
  • enable_positional_arguments – enable positional arguments in ORDER BY, GROUP BY and LIMIT BY
  • input_format_skip_unknown_fields – skip columns with unknown names from input data (it works for JSONEachRow, -WithNames, -WithNamesAndTypes and TSKV formats). Previously, ClickHouse threw an exception.
  • odbc_bridge_use_connection_pooling – use connection pooling in ODBC bridge. If set to false, a new connection is created every time
  • optimize_distinct_in_order – enable DISTINCT optimization if some columns in DISTINCT form a prefix of sorting. For example, prefix of sorting key in merge tree or ORDER BY statement
  • optimize_multiif_to_if – replace ‘multiIf’ with only one condition to ‘if’.
  • optimize_read_in_window_order – enable ORDER BY optimization in window clause for reading data in corresponding order in MergeTree tables.
  • optimize_sorting_by_input_stream_properties – optimize sorting by sorting properties of input stream
  • output_format_json_named_tuples_as_objects – serialize named tuple columns as JSON objects.

ClickHouse embedded monitoring is since 21.8. It now collects host level metrics, and stores them every second in the table system.asynchronious_metric_log. This can be visible as an increase of background writes, storage usage, etc. To return to the old rate of metrics refresh / flush, adjust those settings in config.xml:

<asynchronous_metrics_update_period_s>
    60
</asynchronous_metrics_update_period_s>
<asynchronous_metric_log>
    <flush_interval_milliseconds>
        60000
    </flush_interval_milliseconds>
</asynchronous_metric_log>

In the previous releases we recommended disabling optimize_on_insert. This recommendation stays for 22.8 as well as inserts into Summing and AggregatingMergeTree can slow down.

Changes in Altinity Stable build Compared to Community Build

ClickHouse Altinity Stable builds are based on the community LTS versions. Altinity.Stable 22.8.13.21 is based on community 22.8.13.20-lts, but we have additionally backported several fixes:

Let’s Install!

Linux packages can be found at:

Note: naming schema for Altinity.Stable build packages has been changed since 21.8.x.

21.8.x 22.3.x and later versions
<package>_<ver>.altinitystable_all.deb <package>_<ver>.altinitystable_amd64.deb
<package>-<ver>.altinitystable-2.noarch.rpm <package>-<ver>.altinitystable.x86_64.rpm

Docker images for community versions have been moved from ‘yandex’ to ‘clickhouse’ organization, and should be referenced as ‘clickhouse/clickhouse-server:22.8’.
Altinity Stable build images are available as ‘altinity/clickhouse-server:22.8.13.21.altinitystable’.

Mac users are welcome to use Homebrew Formulae. Ready-to-use bottles are available for both M1 and Intel Macs running Monterey.

For more information on installing ClickHouse from either the Altinity Builds or the Upstream Builds, see the ClickHouse Altinity Stable Release Build Install Guide.

Please contact us at info@altinity.com if you experience any issues with the upgrade.

Appendix

New table functions

  • MeiliSearch – renamed to meilisearch (lowercase) in 22.9
  • mongodb
  • viewIfPermitted

New table engines

  • MeiliSearch
  • NATS

New functions

  • L2SquaredDistance, L2SquaredNorm, distanceL2Squared, normL2Squared
  • LAST_DAY, toLastDayOfMonth
  • addMicroseconds, addMilliseconds, addNanoseconds, subtractMicroseconds, subtractMilliseconds, subtractNanoseconds, toIntervalMicrosecond, toIntervalMillisecond, toIntervalNanosecond
  • arrayFirstOrNull, arrayLastOrNull
  • base58Decode, base58Encode
  • flattenTuple
  • getTypeSerializationStreams
  • Multiple h3* geo functions
  • hashid
  • isNullable
  • makeDate, makeDate32, makeDateTime, makeDateTime64
  • meiliMatch
  • minSampleSizeContinous, minSampleSizeConversion
  • nonNegativeDerivative
  • nowInBlock
  • th_value
  • parseDateTime64BestEffortUS, parseDateTime64BestEffortUSOrNull, parseDateTime64BestEffortUSOrZero
  • parseTimeDelta
  • revision
  • showCertificate
  • toStartOfFiveMinutes, toStartOfMicrosecond, toStartOfMillisecond, toStartOfNanosecond
  • transactionID, transactionLatestSnapshot, transactionOldestSnapshot
  • wyHash64

New formats

  • JSONColumns
  • JSONColumnsWithMetadata
  • JSONCompactColumns
  • JSONLines
  • MySQLDump
  • NDJSON
  • PrettyCompactNoEscapesMonoBlock
  • PrettyMonoBlock
  • PrettyNoEscapesMonoBlock
  • PrettySpaceMonoBlock
  • PrettySpaceNoEscapesMonoBlock
  • Prometheus
  • ProtobufList
  • SQLInsert

New system tables

  • system.backups
  • system.certificates
  • system.filesystem_cache
  • system.merge_tree_metadata_cache
  • system.remote_data_paths
  • system.schema_inference_cache
  • system.settings_changes

New metrics and events

system.events

Several groups of events were added:

  • Cache introspection events: Cached*
  • Kafka engine introspection events (finally!): Kafka*
  • MergeTree metadata cache events: MergeTreeMetadataCache*
  • Keeper events: Keeper*

system.metrics

  • CacheDetachedFileSegments
  • CacheFileSegments
  • FilesystemCacheElements
  • FilesystemCacheReadBuffers
  • FilesystemCacheSize
  • KafkaAssignedPartitions
  • KafkaBackgroundReads
  • KafkaConsumers
  • KafkaConsumersInUse
  • KafkaConsumersWithAssignment
  • KafkaLibrdkafkaThreads
  • KafkaProducers
  • KafkaWrites
  • KeeperAliveConnections
  • KeeperOutstandingRequests
  • S3Requests

Changes in default configuration files https://gist.github.com/filimonov/d60f749be2581b151e6bdaa830bc8111#file-default_config_changes-txt


Also, please refer to the release notes from the development team available at the following URLs: