Clickhouse Altinity Stable Release™ 20.3.12.112

Details of the Altinity Stable Release 20.3.12.112

Description

This release is a significant step forward since the previous Altinity Stable release 19.16.19.85. It includes 1203 pull requests from 171 contributors.

Major New Features

  • TTL moves. See the “Putting Things Where They Belong Using New TTL Moves” article. While we consider this feature to be production ready, we have a roadmap to further improve it . TTL moves have been developed by Altinity.
  • Dictionary DDLs that many ClickHouse users dreamed of! See the “Clickhouse Dictionaries: Reloaded!" article that highlights this feature.
  • New DateTime64 datatype with configurable precision up to nanoseconds. The feature is in beta state and more functions for usability will follow in next releases. It has also been developed by Altinity.
  • Joins have been improved a lot, including SEMI/ANTI JOIN, experimental merge join and other changes. See the join_algorithm setting below for more info on merge joins.
  • A new compact format for MergeTree tables that store all columns in one file. It improves performance of small inserts. The usual format where every column is stored separately is now called “wide.” Compact format is disabled by default. See the min_bytes_for_wide_part and min_rows_for_wide_part settings.
  • A built-in Prometheus exporter endpoint for ClickHouse monitoring statistics
  • Porting some functionality of the H3 library — A Hexagonal Hierarchical Geospatial Indexing System
  • ALTER MODIFY/DROP are currently implemented as mutations for ReplicatedMergeTree* engine family. Now ALTER commands block only at the metadata update stage, and don’t block after that.
  • Import/export gzip-compressed files directly for file based storage engines and table functions: File, URL, HDFS and S3.
  • Server improvements across different server functions. For example, performance of the Kafka Engine has been improved, parallel INSERT is now possible (see max_insert_threads setting), etc.

Upgrade Notes

  • Replication fetch protocol has been changed in this release. If only one replica is upgraded in a shard, replication may get stuck with unknown protocol version error until all replicas are upgraded. 19.16.17.80 and 19.16.19.85 contain a special compatibility fix that allows smooth upgrade without replication downtime.
  • If enable_optimize_predicate_expression is turned on, it may result in incorrect data when only part of the cluster is upgraded. Please turn it off before the upgrade, and turn it back on afterwards.
  • There were some optimizations that are incompatible between versions, so we recommend disabling enable_scalar_subquery_optimization before the upgrade. This setting turns off new scalar subquery optimizations and was backported specifically in order to facilitate smooth upgrades.
  • There were some optimizations around distributed query execution that may result in incorrect data if part of the cluster is upgraded. We recommend disabling distributed_aggregation_memory_efficient while upgrading, and turn it on afterwards.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention.

  • ALTER on ReplicatedMergeTree is not compatible with previous versions.20.3 creates a different metadata structure in ZooKeeper for ALTERs. Earlier versions do not understand the format and cannot proceed with their replication queue.
  • The format of replication log entries for mutation commands has changed. You have to wait for old mutations to process before installing the new version.
  • 20.3 requires an alias for every subquery participating in a join by default. Set joined_subquery_requires_alias=0 in order to keep the previous behavior.
  • ANY JOIN logic has been changed. To upgrade without changes in behavior, you need to add SETTINGS any_join_distinct_right_table_keys = 1 to Engine Join tables metadata or recreate these tables after upgrade.
  • Functions indexHint, findClusterValue, findClusterIndex were removed
  • Settings merge_tree_uniform_read_distribution, allow_experimental_cross_to_join_conversion, allow_experimental_multiple_joins_emulation are deprecated and ignored

Other Important Notes

  • If you encounter [Argument at index 0 for function dictGetDateTime must be a constant](https://github.com/ClickHouse/ClickHouse/issues/7798) or a similar error message after upgrading, set enable_early_constant_folding=0.
  • The new release adds a parallel parsing of input formats. While it should improve the performance of data loading in text formats, sometimes it may result in a slower load. Set input_format_parallel_parsing=0 if you experience insert performance degradation.
  • The check that a query is performed fast enough is enabled by default. Queries can get an exception like this: DB::Exception: Estimated query execution time (N seconds) is too long. Maximum: 12334.. Set timeout_before_checking_execution_speed=0 to fix this problem.
  • Mutations containing non-deterministic functions, e.g. dictGet or joinGet are disabled by default. Set allow_nondeterministic_mutations=1 to enable.

Also note that this release has a new query pipeline enabled by default (codename — Processors). This was a significant internal refactoring, and we believe all issues have been fixed. However, in the rare case that you face some weird query behavior or performance degradation, try disabling experimental_use_processors and check if the problem goes away.

Changes

New formats

  • Avro
  • AvroConfluent
  • JSONCompactEachRow
  • JSONCompactEachRowWithNamesAndTypes

New or Improved Functions

  • CRC32IEEE, CRC64
  • New JSON functions.
    • isValidJSON
    • JSONExtractArrayRaw — very useful for parsing nested JSON structures
  • New array functions.
    • arrayCompact
    • arrayFill
    • arrayReverseFill
    • arrayZip
  • New geo functions.
    • geoDistance
    • greatCircleDistance
  • H3 functions.
    • h3EdgeAngle
    • h3EdgeLengthM
    • h3GetBaseCell
    • h3GetResolution
    • h3HexAreaM2
    • h3IndexesAreNeighbors
    • h3IsValid
    • h3ToChildren
    • h3ToParent
    • h3ToString
    • h3kRing
    • stringToH3
  • Functions for DateTime64.
    • now64
    • parseDateTime64BestEffort, parseDateTime64BestEffortOrNull, parseDateTime64BestEffortOrZero
    • toDateTime64, toDateTime64OrNull, toDateTime64OrZero
  • New aggregate functions:
    • avgWeighted
    • categoricalInformationValue
    • groupArraySample
  • Other.
    • bitCount
    • ifNotFinit.
    • isConstant
    • javaHashUTF16LE
    • moduloOrZero
    • randomPrintableASCII
    • roundBankers

New table functions

  • clusterAllReplicas — similar to cluster function but queries all replicas in the cluster
  • generateRandom — allows users to generate random dataset with defined structure
  • zeros, zeros_mt

New metrics and events

system.metrics:

  • BackgroundMovePoolTask
  • CacheDictionaryUpdateQueueBatches
  • CacheDictionaryUpdateQueueKeys
  • MemoryTrackingInBackgroundMoveProcessingPool
  • MySQLConnection
  • SendScalars

New system tables

  • metric_log — stores system.metrics and system.events history
  • quotas, quota_usage — introspection to quotes
  • row_policies — introspection to row security policies
  • trace_log, stack_trace — for low-level debugging purpose
  • zeros, zeros_mt — zero generators, like system.numbers, but faster and returns zeros

New columns in system tables

  • mutations: parts_to_do_names
  • parts, parts_columns: part_type
  • processes: thread_ids replaces old columns thread_numbers and os_thread_ids
  • query_log: exception_code, thread_ids replaces old columns thread_numbers and os_thread_ids
  • replicas: zookeeper_exception
  • settings: min, max, readonly
  • table_engines: supports_settings, supports_skipping_indices, supports_sort_order, supports_ttl, supports_replication, supports_deduplication

system.merge_tree_settings added/changed

Settings Old value New value Description
disable_background_merges   0 Disable background merges
merge_max_block_size   8192 How many rows in blocks should be formed for merge operations
min_bytes_for_wide_part   0 Minimal uncompressed size in bytes to create part in wide format instead of compact
min_rows_for_wide_part   0 Minimal number of rows to create part in wide format instead of compact
use_minimalistic_part_header_in_zookeeper 0 1 Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes. This setting was available for a year already, and many users have it enabled.

system.settings added/changed

Setting Old value New value Description
allow_experimental_alter_materialized_view_structure   0 Allow atomic alter on Materialized views. Work in progress
allow_experimental_data_skipping_indices 0 1 Data skipping indices are now enabled by default
allow_nondeterministic_mutations   0 Allow non-deterministic functions in ALTER UPDATE/ALTER DELETE statements, e.g. with dictGet() functions
background_move_pool_size   8 Number of threads performing background moves for tables. Only has meaning at server startup
default_max_bytes_in_join 100000000 1000000000 Maximum size of right-side table if limit is required but max_bytes_in_join is not set
enable_early_constant_folding 0 1 Enable query optimization where we analyze function and subqueries results and rewrite query if there’re constants there
enable_scalar_subquery_optimization 0 1 If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once
experimental_use_processors 0 1 Processors pipeline is now enabled by default
force_optimize_skip_unused_shards   1 Throw an exception if unused shards cannot be skipped (1 – throw only if the table has the sharding key, 2 – always throw)
format_avro_schema_registry_url     For AvroConfluent format: Confluent Schema Registry URL
input_format_parallel_parsing   1 Enable parallel parsing for some data formats
input_format_values_deduce_templates_of_expressions 0 1 For Values format: if the field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows
join_algorithm   hash Specify join algorithm: auto, hash, partial_merge, prefer_partial_merge. auto tries to change HashJoin to MergeJoin on the fly to avoid out of memory
joined_subquery_requires_alias 0 1 Force joined subqueries and table functions to have aliases for correct name qualification
max_insert_threads   0 The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see max_threads setting
max_joined_block_size_rows   65536 Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited
max_parser_depth   1000 Maximum parser depth
memory_profiler_step   0 Every number of bytes the memory profiler will collect the allocating stack trace. The minimal effective step is 4 MiB (less values will work as clamped to 4 MiB). Zero means disabled memory profiler
min_bytes_to_use_mmap_io   0 The minimum number of bytes for reading the data with the mmap option during SELECT queries execution. 0 – disabled
min_chunk_bytes_for_parallel_parsing   1048576 The minimum chunk size in bytes, which each thread will parse in parallel
mutations_sync   0 Wait for synchronous execution of ALTER TABLE UPDATE/DELETE queries (mutations). 0 – execute asynchronously. 1 – wait current server. 2 – wait all replicas if they exist
optimize_if_chain_to_miltiif   0 Replace if(cond1, then1, if(cond2, …)) chains to multiIf
optimize_trivial_count_query   1 Process trivial SELECT count() FROM table query from metadata
output_format_avro_codec     Compression codec used for output. Possible values: null, deflate, snappy
output_format_avro_sync_interval   16384 Sync interval in bytes
output_format_csv_crlf_end_of_line     If it is set true, end of line in CSV format will be \r\n instead of \n
output_format_tsv_crlf_end_of_line     If it is set true, end of line in TSV format will be \r\n instead of \n
timeout_before_checking_execution_speed 0 10 Check that the speed is not too low after the specified time has elapsed
use_compact_format_in_distributed_parts_names   0 Changes format of directories names for distributed table insert parts

References

RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used as well but check version numbers carefully.