Altinity Stable Release for ClickHouse V20 and older

Here are the release notes from V20 and older. You can choose any of the following links to go directly to a particular release:

V20.8

Point releases:

Altinity Stable Release for ClickHouse 20.8.7.15

Released 2020-12-02

Description

This release provides several changes and new features from the previous release.

Major New Features

  • Security features:
    • RBAC and SQL management for users, roles, grants etc.
    • LDAP authentication
    • Kerberos authentication in Kafka Engine
  • Dictionary features:
    • direct and ssd_cache layouts
    • Redis, MongoDB, Cassandra sources
    • Automatic query rewrite for joins (see an example here)
    • Experimental polygon dictionaries
  • MergeTree features:
    • S3 disk for MergeTree tables! It is still an experimental feature, see our article for more detail.
    • New TTL types: TTL DELETE WHERE and TTL GROUP BY.
    • In-memory MergeTree parts
  • Integrations:
    • PostgreSQL wire protocol
    • RabbitMQ and MongoDB storage engines
    • Tencent Cloud Object Storage (COS)
    • Highly experimental MaterializeMySQL engine that implements MySQL replica in ClickHouse
  • SQL compatibility:
    • New Int128, (U)Int256, Decimal256 extended precision data types
    • Aliases for standard SQL types
    • EXPLAIN statement!
    • Merge join improvements
  • Custom HTTP handlers
  • clickhouse-copier underwent extensive updates and improvements

Upgrade Notes

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • Aggregate functions states with Nullable arguments may produce different / incompatible types.
  • Gorilla, Delta and DoubleDelta codecs can not be used anymore on data types of variable size (like strings).
  • System tables (e.g. system.query_log, system.trace_log, system.metric_log) are using compact data part format for parts smaller than 10 MiB in size (this is almost always the case). Compact data part format is supported since version 20.3.
  • WARNING: If you have to downgrade to version prior 20.3, you should manually delete table data for system logs in /var/lib/clickhouse/data/system/.
  • The setting input_format_with_names_use_header is enabled by default. It will affect parsing of input formats -WithNames and -WithNamesAndTypes.
  • Deprecate special printing of zero Date/DateTime values as '0000-00-00’ and '0000-00-00 00:00:00’. Now it is printed as ‘1970-01-01’ and '1970-01-01 00:00:00’ respectively.

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended. Note the following:

  • 20.3 two-level aggregation is not compatible with 20.4+.
    Data will not be fully aggregated for queries that are processed using the two-level aggregation algorithm. This algorithm should be disabled before upgrading if the risk is high in your environment. See group_by_two_level_threshold and group_by_two_level_threshold_bytes settings.
  • zstd library has been updated in 20.5. While it is not a problem for ClickHouse in general, it may result in inconsistent parts when several replicas merge parts independently, and will force ClickHouse to download merged parts to make sure they are byte-identical (which will lead to extra traffic between nodes). The first node to complete the merge will register the part in ZooKeeper, and the other nodes will download the part if their checksums are different. There will be no data loss; conflicts will disappear once all replicas are upgraded.
  • The following settings lead to incompatibility in distributed queries when only a subset of shards are upgraded and others are not:
    • optimize_move_functions_out_of_any
    • optimize_arithmetic_operations_in_aggregate_functions
    • optimize_injective_functions_inside_uniq
    • optimize_if_transform_strings_to_enum
  • When pre-20.5 and 20.5+ versions run as replicas “Part ... intersects previous part" errors are possible due to change in leadership selection protocol. If you need to run pre-20.5 and 20.5+ versions in the same cluster make sure the old version can not become a leader. This can be configured via replicated_can_become_leader merge tree setting globally or on a table level.

Changes

  • All replicas are now ’leaders’. This allows multiple replicas to assign merges, mutations, partition drop, move and replace concurrently. Now system.replicas.is_leader is 1 for all tables on all nodes. If you rely on this value for some operations, your processes must be revised. The LeaderElection and LeaderReplica metrics were removed.
  • New setting max_server_memory_usage limits total memory usage of the server. The setting max_memory_usage_for_all_queries is now obsolete and does nothing. You might see an exception ‘Memory limit (total) exceeded‘.Increasing the limit requires a restart.
  • The log_queries setting is now enabled by default. You might want to disable this setting for some profiles if you don’t want their queries logged into the system.query_log table.
  • Several new optimizations are enabled by default. While they typically improve performance sometimes regressions are possible in corner cases:
    • optimize_aggregators_of_group_by_keys
    • optimize_arithmetic_operations_in_aggregate_functions
    • optimize_duplicate_order_by_and_distinct
    • optimize_group_by_function_keys
    • optimize_injective_functions_inside_uniq
    • optimize_move_functions_out_of_any \ optimize_monotonous_functions_in_order_by
    • optimize_redundant_functions_in_order_by
    • optimize_trivial_insert_select
    • partial_merge_join_optimizations
New data types
  • All standard SQL data types, like BIGINT, VARCHAR, etc. are mapped to ClickHouse data types. See the system.data_type_families table.
  • Experimental data types Point, Ring, Polygon, MultiPolygon
  • Int128, Int256, UInt256, Decimal256 extended precision data types
New formats
  • Arrow
  • ArrowStream
  • JSONAsString – this one allows to store the full JSON row unparsed and process later using ClickHouse JSONExtract functions
  • Markdown
  • MsgPack
  • PosgreSQLWire
  • Regexp - allows to parse any non-standard textish formats (for example logs) by applying regular expression to every line.
New functions
  • DateTime functions:
    • fromUnixTimestamp / FROM_UNIXTIME
    • dateTrunc / date_trunc
    • fromUnixTimestamp64Micro
    • fromUnixTimestamp64Milli
    • fromUnixTimestamp64Nano
    • toUnixTimestamp64Micro
    • toUnixTimestamp64Milli
    • toUnixTimestamp64Nano
  • String functions:
    • extractGroups, extractAllGroupsHorizontal, extractAllGroupsVertical (alias to extractAllGroups)
    • Ilike, notILike (also new SQL operator ILIKE has been added)
  • Array functions:
    • arrayReduceInRanges
    • hasSubstr
  • Machine-learning and statistics:
    • arrayAUC -- area Under the ROC Curve
    • bayesAB -- bayesian A/B Testing Calculator
    • medianExactHigh, medianExactLow
    • quantileExactHigh, quantileExactLow, quantilesExactHigh, quantilesExactLow
  • JSON
    • JSONExtractKeysAndValuesRaw
  • URL functions
    • port
    • netloc
  • Aggregate functions
    • -Distinct – new aggregate function combinator
    • initializeAggregation– allow initialization of AggregateFunction state
  • Functions to work with key-value pairs (aka maps):
    • mapAdd, mapSubtract, maxMap, minMap (extending the API of sumMap).
    • SimpleAggregateFunction now supportsminMap, maxMap, sumMap.
    • The new data type Map is currently in development that should make it even more intuitive.
  • MySQL integration
    • DATABASE (alias for currentDatabase)
    • globalVariable (stub)
  • New types related:
    • toDecimal256, toDecimal256OrNull, toDecimal256OrZero
    • toInt128, toInt128OrNull, toInt128OrZero
    • toInt256, toInt256OrNull, toInt256OrZero
    • toUInt256, toUInt256OrNull, toUInt256OrZero
  • engine=Join
    • joinGetOrNull
  • Random:
    • fuzzBits (used for CI)
    • rand32 (alias for rand)
    • randomFixedString, randomString, randomStringUTF8
  • Serialize columns to some text format
    • formatRow, formatRowNoNewline
  • Settings & custom settings:
    • getSetting
  • Check types
    • isDecimalOverflow
    • defaultValueOfTypeName
    • isZeroOrNull
  • Helper functions to analyze query_log
    • normalizeQuery, normalizedQueryHash
  • Other:
    • countDigits
    • mod (alias for modulo)
  • Special (used in CI / by ClickHouse developers):
    • hasThreadFuzzer
    • buildId
  • Related to internals of execution of of IN / GLOBAL IN operator (also see transform_null_in setting)
    • notNullIn, notNullInIgnoreSet, nullIn, nullInIgnoreSet, inIgnoreSet, notInIgnoreSet
    • globalNotNullIn, globalNotNullInIgnoreSet, globalNullIn, globalNullInIgnoreSet, globalInIgnoreSet, globalNotInIgnoreSet
New table functions
  • cosn – integration with Tencent Cloud Object Storage (COS)
New table engines
  • MongoDB
  • RabbitMQ
  • COSN
New metrics and events

system.metrics:

  • BackgroundBufferFlushSchedulePoolTask
  • BackgroundDistributedSchedulePoolTask
  • MemoryTrackingInBackgroundBufferFlushSchedulePool
  • MemoryTrackingInBackgroundDistributedSchedulePool
  • PostgreSQLConnection

system.events:

  • OSCPUVirtualTimeMicroseconds
  • OSCPUWaitMicroseconds
  • OSReadBytes, OSReadChars
  • OSWriteBytes, OSWriteChars
  • QueryTimeMicroseconds
New system tables (mostly for RBAC introspection)
  • current_roles
  • distribution_queue
  • enabled_roles
  • grants
  • licenses
  • privileges
  • quota_limits
  • quotas_usage
  • row_policies
  • role_grants
  • roles
  • settings_profile_elements
  • settings_profiles
  • time_zones
  • user_directories
  • users
New columns in system tables
  • columns.position, parts_columns.position
  • databases.uuid, dictionaries.uuid, tables.uuid
  • disks.type
  • merge_tree_settings.type, settings.type
  • parts: delete_ttl_info_min, delete_ttl_info_max, move_ttl_info_min, move_ttl_info_max, move_ttl_info.expression
  • query_log.current_database
  • storage_policies.volume_type
  • tables: total_rows, total_bytes, lifetime_rows, lifetime_bytes
  • trace_log: trace_type, size, timestamp_ns
system.merge_tree_settings added/changed
Setting Old value New value Description
allow_nullable_key 0 Allow Nullable types as primary keys.
always_fetch_merged_part 0 If true, replicas never merge parts and always download merged parts from other replicas.
disable_background_merges 0 REMOVED
enable_mixed_granularity_parts 0 1 Enable parts with adaptive and non-adaptive granularity at the same time
in_memory_parts_enable_wal 1 Whether to write blocks in Native format to write-ahead-log before creation in-memory part
lock_acquire_timeout_for_background_operations 120 For background operations like merges, mutations etc. How many seconds before failing to acquire table locks.
max_part_loading_threads auto max_part_loading_threads
max_part_removal_threads auto The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).
max_replicated_logs_to_keep 10000 1000 How many records may be in log, if there is an inactive replica.
min_replicated_logs_to_keep 100 10 Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn’t affect work of tables: used only to diagnose ZooKeeper log before cleaning.
min_bytes_for_compact_part 0 Minimal uncompressed size in bytes to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_rows_for_compact_part 0 Minimal number of rows to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_index_granularity_bytes 1024 Minimum amount of bytes in single granule.
min_relative_delay_to_measure 120 Calculate relative replica delay only if absolute delay is not less than this value.
write_ahead_log_max_bytes 1073741824 Rotate WAL, if it exceeds that amount of bytes
system.settings added/changed
Setting Old value New value Description
allow_experimental_bigint_types 0 Allow Int128, Int256, UInt256 and Decimal256 types
allow_experimental_database_materialize_mysql 1 Allow database creation with Engine=MaterializeMySQL(…) (Highly experimental yet)
allow_experimental_geo_types 0 Allow geo data types such as Point, Ring, Polygon, MultiPolygon
allow_non_metadata_alters 1 Allow to execute alters which affects table’s metadata and data on disk.
allow_push_predicate_when_subquery_contains_with 1 Allows push predicate when subquery contains WITH clause
allow_suspicious_codecs 0 If it is set to true, allow specifying meaningless compression codecs.
alter_partition_verbose_result 0 Output information about affected parts. Currently works only for FREEZE and ATTACH commands.
background_buffer_flush_schedule_pool_size 16 Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup.
background_distributed_schedule_pool_size 16 Number of threads performing background tasks for distributed sends. Only has meaning at server startup.
cast_keep_nullable 0 CAST operator keep Nullable for result data type
data_type_default_nullable 0 Data types without NULL or NOT NULL will make Nullable
default_database_engine Ordinary Default database engine
distributed_replica_max_ignored_errors 0 Number of errors that will be ignored while choosing replicas
force_optimize_skip_unused_shards_nesting 0 Same as force_optimize_skip_unused_shards, but accept nesting level until which it will work
format_regexp Regular expression (for Regexp format)
format_regexp_escaping_rule Escaped Field escaping rule (for Regexp format)
format_regexp_skip_unmatched 0 Skip lines unmatched by regular expression (for Regexp format)
function_implementation Choose function implementation for specific target or variant (experimental). If empty, enable all of them.
input_format_avro_allow_missing_fields 0 For Avro/AvroConfluent format: when field is not found in schema use default value instead of error
input_format_with_names_use_header 0 1 For TSVWithNames and CSVWithNames input formats this controls whether the format parser is to assume that column data appear in the input exactly as they are specified in the header.
insert_in_memory_parts_timeout 600000
join_on_disk_max_files_to_merge 64 For MergeJoin on disk, set how many files are allowed to sort simultaneously. The larger the value the more memory is used and less disk I/O needed. Minimum is 2.
lock_acquire_timeout 120 How long locking request should wait before failing
log_queries_min_type QUERY_START Minimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING.
materialize_ttl_after_modify 1 Apply TTL for old data, after ALTER MODIFY TTL query
max_block_size 65536 65505 Maximum block size for reading
max_final_threads 16 The maximum number of threads to read from the table with FINAL.
max_insert_block_size 1048576 1048545 The maximum block size for insertion, if we control the creation of blocks for insertion
max_joined_block_size_rows 65536 65505 Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.
max_untracked_memory 4194304 Small allocations and deallocations are grouped in thread local variables and tracked or profiled only when the amount (in absolute value) becomes larger than specified value. If the value is higher than ‘memory_profiler_step’ it will be effectively lowered to ‘memory_profiler_step’.
memory_profiler_sample_probability 0 Collect random allocations and deallocations and write them into system.trace_log with 'MemorySample' trace_type. The probability is for every alloc/free regardless to the size of the allocation. Note that sampling happens only when the amount of untracked memory exceeds ‘max_untracked_memory’. You may want to set ‘max_untracked_memory’ to 0 for extra fine grained sampling.
metrics_perf_events_enabled 0 If enabled, some of the perf events will be measured throughout queries’ execution.
metrics_perf_events_list Comma separated list of perf metrics that will be measured throughout queries’ execution. Empty means all events.
min_chunk_bytes_for_parallel_parsing 1048576 10485760 The minimum chunk size in bytes, which each thread will parse in parallel.
min_insert_block_size_bytes 268435456 268427520 Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.
min_insert_block_size_bytes_for_materialized_views 0 Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)
min_insert_block_size_rows 1048576 1048545 Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.
min_insert_block_size_rows_for_materialized_views 0 Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows)
multiple_joins_rewriter_version 2 1 or 2. Second rewriter version knows about table columns and keeps not clashing names as is.
optimize_aggregation_in_order 0 Enable GROUP BY optimization for aggregating data in corresponding order in MergeTree tables.
optimize_aggregators_of_group_by_keys 1 Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section
optimize_arithmetic_operations_in_aggregate_functions 1 Move arithmetic operations out of aggregation functions
optimize_distributed_group_by_sharding_key 0 Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server)
optimize_duplicate_order_by_and_distinct 1 Remove duplicate ORDER BY and DISTINCT if it’s possible
optimize_group_by_function_keys 1 Eliminates functions of other keys in GROUP BY section
optimize_if_chain_to_multiif 0 Replace if(cond1, then1, if(cond2, …)) chains to multiIf. Currently it’s not beneficial for numeric types.
optimize_if_transform_strings_to_enum 0 Replaces string-type arguments in If and Transform to enum. Disabled by default cause it could make inconsistent change in distributed query that would lead to its fail
optimize_injective_functions_inside_uniq 1 Delete injective functions of one argument inside uniq*() functions
optimize_monotonous_functions_in_order_by 1 Replace monotonous function with its argument in ORDER BY
optimize_move_functions_out_of_any 1 Move functions out of aggregate functions ‘any’, ‘anyLast’
optimize_redundant_functions_in_order_by 1 Remove functions from ORDER BY if its argument is also in ORDER BY
optimize_skip_unused_shards_nesting 0 Same as optimize_skip_unused_shards, but accept nesting level until which it will work
optimize_trivial_insert_select 1 Optimize trivial ‘INSERT INTO table SELECT … FROM TABLES’ query
output_format_enable_streaming 0 Enable streaming in output formats that supports it
output_format_pretty_grid_charset UTF-8 Charset for printing grid borders. Available charsets: ASCII, UTF-8 (default)
output_format_pretty_max_value_width 10000 Maximum width of value to display in Pretty formats. If greater - it will be cut.
parallel_distributed_insert_select 0 Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard
partial_merge_join_left_table_buffer_bytes 32000000 If not 0, group left table blocks in bigger ones for the left-side table in partial merge join. It uses up to 2x of specified memory per joining thread. The current version works only with ‘partial_merge_join_optimizations = 1’.
partial_merge_join_optimizations 0 1 Enable optimizations in partial merge join
partial_merge_join_rows_in_right_blocks 10000 65536 Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.
rabbitmq_max_wait_ms 5000 The wait time for reading from RabbitMQ before retry.
read_in_order_two_level_merge_threshold 100 Minimal number of parts to read to run preliminary merge step during multithread reading in order of primary key.
send_logs_level none fatal Send server text logs with specified minimum level to client. Valid values: ’trace’, ‘debug’, ‘information’, ‘warning’, ’error’, ‘fatal’, ’none’
show_table_uuid_in_table_create_query_if_not_nil 0 For tables in databases with Engine=Atomic show UUID of the table in its CREATE query.
temporary_files_codec LZ4 Set compression codec for temporary files (sort and join on disk). I.e. LZ4, NONE.
transform_null_in 0 If enabled, NULL values will be matched with ‘IN’ operator as if they are considered equal.
validate_polygons 1 Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result.

References

ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from the Altinity Stable Repository.

Altinity Stable Release for ClickHouse 20.8.12.2

Released 2021-02-03

Description

Bug fixes.

Bug Fixes
  • Fixed a rare bug that may result in corrupted data when merging from wide to compact parts. We recommend upgrading to this release all 20.8 users who are using compact parts.

Altinity StableRelease Release for ClickHouse 20.8.11.17

Released 2020-12-25

Description

This release provides bug fixes and general stability improvements.

V20.3

Point releases:

Altinity Stable Release for ClickHouse 20.3.12.112

Released 2020-06-24

Description

This release is a significant step forward since the previous Altinity Stable release 19.16.19.85. It includes 1203 pull requests from 171 contributors.

Major New Features

  • TTL moves. See the Putting Things Where They Belong Using New TTL Moves article. While we consider this feature to be production ready, we have a roadmap to further improve it. TTL moves have been developed by Altinity.
  • Dictionary DDLs that many ClickHouse users dreamed of! See the ClickHouse Dictionaries: Reloaded! article that highlights this feature.
  • New DateTime64 datatype with configurable precision up to nanoseconds. The feature is in beta state and more functions for usability will follow in next releases. It has also been developed by Altinity.
  • Joins have been improved a lot, including SEMI/ANTI JOIN, experimental merge join and other changes. See the join_algorithm setting below for more info on merge joins.
  • A new compact format for MergeTree tables that store all columns in one file. It improves performance of small inserts. The usual format where every column is stored separately is now called “wide.” Compact format is disabled by default. See the min_bytes_for_wide_part and min_rows_for_wide_part settings.
  • A built-in Prometheus exporter endpoint for ClickHouse monitoring statistics
  • Porting some functionality of the H3 library — A Hexagonal Hierarchical Geospatial Indexing System
  • ALTER MODIFY/DROP are currently implemented as mutations for ReplicatedMergeTree* engine family. Now ALTER commands block only at the metadata update stage, and don’t block after that.
  • Import/export gzip-compressed files directly for file based storage engines and table functions: File, URL, HDFS and S3.
  • Server improvements across different server functions. For example, performance of the Kafka Engine has been improved, parallel INSERT is now possible (see max_insert_threads setting), etc.

Upgrade Notes

  • Replication fetch protocol has been changed in this release. If only one replica is upgraded in a shard, replication may get stuck with unknown protocol version error until all replicas are upgraded. 19.16.17.80 and 19.16.19.85 contain a special compatibility fix that allows smooth upgrade without replication downtime.
  • If enable_optimize_predicate_expression is turned on, it may result in incorrect data when only part of the cluster is upgraded. Please turn it off before the upgrade, and turn it back on afterwards.
  • There were some optimizations that are incompatible between versions, so we recommend disabling enable_scalar_subquery_optimization before the upgrade. This setting turns off new scalar subquery optimizations and was backported specifically in order to facilitate smooth upgrades.
  • There were some optimizations around distributed query execution that may result in incorrect data if part of the cluster is upgraded. We recommend disabling distributed_aggregation_memory_efficient while upgrading, and turn it on afterwards.
Backward Incompatible Changes

The following changes are backward incompatible and require user attention.

  • ALTER on ReplicatedMergeTree is not compatible with previous versions.20.3 creates a different metadata structure in ZooKeeper for ALTERs. Earlier versions do not understand the format and cannot proceed with their replication queue.
  • The format of replication log entries for mutation commands has changed. You have to wait for old mutations to process before installing the new version.
  • 20.3 requires an alias for every subquery participating in a join by default. Set joined_subquery_requires_alias=0 in order to keep the previous behavior.
  • ANY JOIN logic has been changed. To upgrade without changes in behavior, you need to add SETTINGS any_join_distinct_right_table_keys = 1 to Engine Join tables metadata or recreate these tables after upgrade.
  • Functions indexHint, findClusterValue, findClusterIndex were removed
  • Settings merge_tree_uniform_read_distribution, allow_experimental_cross_to_join_conversion, allow_experimental_multiple_joins_emulation are deprecated and ignored
Other Important Notes
  • If you encounter [Argument at index 0 for function dictGetDateTime must be a constant](https://github.com/ClickHouse/ClickHouse/issues/7798) or a similar error message after upgrading, set enable_early_constant_folding=0.
  • The new release adds a parallel parsing of input formats. While it should improve the performance of data loading in text formats, sometimes it may result in a slower load. Set input_format_parallel_parsing=0 if you experience insert performance degradation.
  • The check that a query is performed fast enough is enabled by default. Queries can get an exception like this: DB::Exception: Estimated query execution time (N seconds) is too long. Maximum: 12334.. Set timeout_before_checking_execution_speed=0 to fix this problem.
  • Mutations containing non-deterministic functions, e.g. dictGet or joinGet are disabled by default. Set allow_nondeterministic_mutations=1 to enable.

Also note that this release has a new query pipeline enabled by default (codename — Processors). This was a significant internal refactoring, and we believe all issues have been fixed. However, in the rare case that you face some weird query behavior or performance degradation, try disabling experimental_use_processors and check if the problem goes away.

Changes

New formats
  • Avro
  • AvroConfluent
  • JSONCompactEachRow
  • JSONCompactEachRowWithNamesAndTypes
New or Improved Functions
  • CRC32IEEE, CRC64
  • New JSON functions.
    • isValidJSON
    • JSONExtractArrayRaw — very useful for parsing nested JSON structures
  • New array functions.
    • arrayCompact
    • arrayFill
    • arrayReverseFill
    • arrayZip
  • New geo functions.
    • geoDistance
    • greatCircleDistance
  • H3 functions.
    • h3EdgeAngle
    • h3EdgeLengthM
    • h3GetBaseCell
    • h3GetResolution
    • h3HexAreaM2
    • h3IndexesAreNeighbors
    • h3IsValid
    • h3ToChildren
    • h3ToParent
    • h3ToString
    • h3kRing
    • stringToH3
  • Functions for DateTime64.
    • now64
    • parseDateTime64BestEffort, parseDateTime64BestEffortOrNull, parseDateTime64BestEffortOrZero
    • toDateTime64, toDateTime64OrNull, toDateTime64OrZero
  • New aggregate functions:
    • avgWeighted
    • categoricalInformationValue
    • groupArraySample
  • Other.
    • bitCount
    • ifNotFinit.
    • isConstant
    • javaHashUTF16LE
    • moduloOrZero
    • randomPrintableASCII
    • roundBankers

New table functions

  • clusterAllReplicas — similar to cluster function but queries all replicas in the cluster
  • generateRandom — allows users to generate random dataset with defined structure
  • zeros, zeros_mt

New metrics and events

system.metrics:

  • BackgroundMovePoolTask
  • CacheDictionaryUpdateQueueBatches
  • CacheDictionaryUpdateQueueKeys
  • MemoryTrackingInBackgroundMoveProcessingPool
  • MySQLConnection
  • SendScalars

New system tables

  • metric_log — stores system.metrics and system.events history
  • quotas, quota_usage — introspection to quotes
  • row_policies — introspection to row security policies
  • trace_log, stack_trace — for low-level debugging purpose
  • zeros, zeros_mt — zero generators, like system.numbers, but faster and returns zeros

New columns in system tables

  • mutations: parts_to_do_names
  • parts, parts_columns: part_type
  • processes: thread_ids replaces old columns thread_numbers and os_thread_ids
  • query_log: exception_code, thread_ids replaces old columns thread_numbers and os_thread_ids
  • replicas: zookeeper_exception
  • settings: min, max, readonly
  • table_engines: supports_settings, supports_skipping_indices, supports_sort_order, supports_ttl, supports_replication, supports_deduplication

system.merge_tree_settings added/changed

Settings Old value New value Description
disable_background_merges   0 Disable background merges
merge_max_block_size   8192 How many rows in blocks should be formed for merge operations
min_bytes_for_wide_part   0 Minimal uncompressed size in bytes to create part in wide format instead of compact
min_rows_for_wide_part   0 Minimal number of rows to create part in wide format instead of compact
use_minimalistic_part_header_in_zookeeper 0 1 Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes. This setting was available for a year already, and many users have it enabled.

system.settings added/changed

Setting Old value New value Description
allow_experimental_alter_materialized_view_structure   0 Allow atomic alter on Materialized views. Work in progress
allow_experimental_data_skipping_indices 0 1 Data skipping indices are now enabled by default
allow_nondeterministic_mutations   0 Allow non-deterministic functions in ALTER UPDATE/ALTER DELETE statements, e.g. with dictGet() functions
background_move_pool_size   8 Number of threads performing background moves for tables. Only has meaning at server startup
default_max_bytes_in_join 100000000 1000000000 Maximum size of right-side table if limit is required but max_bytes_in_join is not set
enable_early_constant_folding 0 1 Enable query optimization where we analyze function and subqueries results and rewrite query if there’re constants there
enable_scalar_subquery_optimization 0 1 If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once
experimental_use_processors 0 1 Processors pipeline is now enabled by default
force_optimize_skip_unused_shards   1 Throw an exception if unused shards cannot be skipped (1 – throw only if the table has the sharding key, 2 – always throw)
format_avro_schema_registry_url     For AvroConfluent format: Confluent Schema Registry URL
input_format_parallel_parsing   1 Enable parallel parsing for some data formats
input_format_values_deduce_templates_of_expressions 0 1 For Values format: if the field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows
join_algorithm   hash Specify join algorithm: auto, hash, partial_merge, prefer_partial_merge. auto tries to change HashJoin to MergeJoin on the fly to avoid out of memory
joined_subquery_requires_alias 0 1 Force joined subqueries and table functions to have aliases for correct name qualification
max_insert_threads   0 The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see max_threads setting
max_joined_block_size_rows   65536 Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited
max_parser_depth   1000 Maximum parser depth
memory_profiler_step   0 Every number of bytes the memory profiler will collect the allocating stack trace. The minimal effective step is 4 MiB (less values will work as clamped to 4 MiB). Zero means disabled memory profiler
min_bytes_to_use_mmap_io   0 The minimum number of bytes for reading the data with the mmap option during SELECT queries execution. 0 – disabled
min_chunk_bytes_for_parallel_parsing   1048576 The minimum chunk size in bytes, which each thread will parse in parallel
mutations_sync   0 Wait for synchronous execution of ALTER TABLE UPDATE/DELETE queries (mutations). 0 – execute asynchronously. 1 – wait current server. 2 – wait all replicas if they exist
optimize_if_chain_to_miltiif   0 Replace if(cond1, then1, if(cond2, …)) chains to multiIf
optimize_trivial_count_query   1 Process trivial SELECT count() FROM table query from metadata
output_format_avro_codec     Compression codec used for output. Possible values: null, deflate, snappy
output_format_avro_sync_interval   16384 Sync interval in bytes
output_format_csv_crlf_end_of_line     If it is set true, end of line in CSV format will be \r\n instead of \n
output_format_tsv_crlf_end_of_line     If it is set true, end of line in TSV format will be \r\n instead of \n
timeout_before_checking_execution_speed 0 10 Check that the speed is not too low after the specified time has elapsed
use_compact_format_in_distributed_parts_names   0 Changes format of directories names for distributed table insert parts

References

RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used as well but check version numbers carefully.

Altinity Stable Release for ClickHouse 20.3.19.4

Released 2020-09-23

Description

The release includes several dozen bug fixes from the previous release.

Major New Features
  • Server startup time has been reduced by parsing metadata in parallel.
  • Improved performance of primary key analysis for LowCardinality columns. That was a performance regression introduced in 20.3.
  • Improved performance of queries with large tuples and tuples used in primary key.
Bug Fixes
  • Fixed a bug that prevented attaching Materialized Views to system tables.
  • Fixed incorrect behavior of if function with NULLs.
  • Fixed segfaults in rare cases.
  • Fixed a bug that prevented predicate pushdown for queries using WITH clause.
  • Fixed SIGSEGV in Kafka engine when broker is unavailable.
  • Fixed a bug leading to block structure mismatch error for queries with UNION and JOIN.
  • Fixed TTL processing logic to process all partitions in one run.
  • Fixed a bug of parsing of row policies from configuration files that could result in missing policies sometimes.
  • Fixed a bug with ALTER TABLE UPDATE could produce incorrect results when updating Nullable columns.
  • Fixed a bug with codecs not working properly for MergeTree compact parts.

References

Updated RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used for DEB packages but check version numbers carefully.

V19.16

Point releases:

Altinity Stable Release for ClickHouse 19.16.10.44

Released 2020-01-20

Description

This release has major updates:: 2744 commits in 528 pull-requests from 113 contributors.

Major New Features

  • Tiered storage: multiple storage volumes for MergeTree tables. It’s possible to store fresh data on SSD and automatically move old data to HDD. We already discussed this functionality in our blog.

  • LIVE VIEW tables that we also described in our blog. The support in this release is not complete, but new improvements are coming quickly. Feel free to try it out.

  • WITH FILL modifier of ORDER BY for data gap filling. It allows to fill missing data to provide uniform reports. For example, you can fill missing dates for time series so that every day is shown even if some days have no data. This feature is not documented yet, so here is an example how missing dates can be filled:

    SELECT arrayJoin([today()-10, today()]) AS d ORDER BY d ASC WITH FILL

  • Table constraints that are checked at insert.

  • Sensitive data masking for query_log, server logs, process list with regexp-based rules.

  • Table function input() for reading incoming data in INSERT SELECT query. A very useful feature when you need to preprocess data just before inserting. We are going to showcase it in our planned articles around log processing use cases.

  • Cascaded materialized views. It is an important feature for many use cases including Kafka integration. In particular, it allows you to have load raw data from Kafka using MV and then aggregate it using another MV. That was not possible in previous ClickHouse releases.

  • Kafka: Altinity took over support for the Kafka engine a few months ago. Kafka functionality and stability has been improved in this release, in particular:

    • ClickHouse can act as Kafka producer, and not just to read from Kafka, but also send data back with an insert statement.
    • Atomic parsing of each message: kafka_row_delimiter is now obsolete (ignored)
    • More reliable commit logic
    • Virtual columns _partition and _timestamp for Kafka engine table.
    • Parsing of most of the formats is working properly now
Known issues
  • In case of topic rebalancing on the Kafka side, duplicates are possible. We are going to address it in 2020/Q1. https://github.com/ClickHouse/ClickHouse/issues/7259.
  • Select from mysql() table function is broken in this release. Two workarounds are possible:
    • Use MySQL table engine.
    • create table as mysql(...) and then select from this table — this is actually a shortcut to MySQL table engine.
  • Some constant-like functions can misbehave when queried through Distributed tables: https://github.com/ClickHouse/ClickHouse/issues/7666.
  • ‘system reload dictionary’ does not reload dictionaries with ‘update_field’: https://github.com/ClickHouse/ClickHouse/issues/7440.
  • complicated where conditions involving fields of UInt8 type with values > 1 can return unexpected results. (workaround: instead of a and b and c use a <> 0 and b <> 0 and c <> 0) https://github.com/ClickHouse/ClickHouse/issues/7772.
  • If you want to preserve old ANY JOIN behavior while upgrading from a version before 19.13, you may need to install 19.13.7 first, change any_join_distinct_right_table_keys setting there and after that you can upgrade to 19.16. But we recommend to review your queries and rewrite them without this join type. In future releases it will be available, but with a different name (SEMI JOIN).

Upgrade Notes

Backward Incompatible Changes
  • count() supports only a single argument.
  • Legacy asterisk_left_columns_only setting has been removed (it was disabled by default).
  • Numeric values for Enums can now be used directly in IN section of the query.
  • Changed serialization format of bitmap* aggregate function states to improve performance. Serialized states of bitmap* from previous versions cannot be read. If you happen to use bitmap aggregate functions, please contact us before upgrading.
  • system.query_log column type was changed from UInt8 to Enum8.
  • ANY RIGHT/FULL/INNER JOIN is disabled by default. Set any_join_distinct_right_table_keys setting to enable them.

Changes

New Formats
  • ORC format.
  • Template/TemplateIgnoreSpaces format. It allows to parse / generate data in custom text formats. So you can for example generate HTML directly from ClickHouse thus turning ClickHouse to the web server.
  • CustomSeparated/CustomSeparatedIgnoreSpaces format. Supports custom escaping and delimiter rules.
  • JSONEachRowWithProgress
  • Parse unquoted NULL literal as NULL (enabled by setting format_csv_unquoted_null_literal_as_null).
  • Initialize null fields with default values if the data type of this field is not nullable (enabled by setting input_format_null_as_default).
New or improved functions
  • Aggregate function combinators which fill null or default value when there is nothing to aggregate: -OrDefault, -OrNull

  • Introduce uniqCombined64() to get sane results for cardinality > UINT_MAX

  • QuantileExactExclusive and Inclusive aggregate functions

  • hasToken/hasTokenCaseInsensitive (look for the token in string the same way as token_bf secondary index)

  • multiFuzzyMatchAllIndices, multiMatchAllIndices (return the Array of all matched indices in multiMatch family functions)

  • repeat function for strings

  • sigmoid and tanh functions for ML applications

  • Roaring Bitmaps:

    • Changes CRoaring serialization functions (you will not be able to read Bitmaps created by earlier versions)
    • bitmapSubsetInRange,
    • bitmapMin, bitmapMax,
    • bitmapSubsetLimit(bitmap, range_start, limit),
    • groupBitmapAnd, groupBitmapOr, groupBitmapXor
  • geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision) which creates an array of precision-long strings of geohash boxes covering the provided area.

  • Support for wildcards in paths of table functions file and hdfs. If the path contains wildcards, the table will be readonly:

    SELECT * FROM hdfs('hdfs://hdfs1:9000/some_dir/another_dir/*/file', 'Parquet', 'col1 String')

  • New function neighbour(value, offset[, default_value]). Allows to reach prev/next row within the column.

  • Optimize queries with ORDER BY expressions clause, where expressions have coinciding prefix with sorting key in MergeTree tables. This optimization is controlled by optimize_read_in_order ‘setting

  • New function arraySplit and arrayReverseSplit which can split an array by “cut off” conditions. They are useful in time sequence handling.

  • Table function values (the name is case-insensitive). It allows to create table with some data inline.

    SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))

  • fullHostName (alias FQDN)

  • numbers_mt() — multithreaded version of numbers().

  • currentUser() (and alias user()), returning login of authorized user.

  • S3 engine and table function. Partial support in this release (no authentication), complete version is expected in 19.18.x and later

  • WITH TIES modifier for LIMIT

New dictionaries features
  • Redis as an external dictionary source.
  • <sparse_hashed> dictionary layout, that is functionally equivalent to the <hashed> layout, but is more memory efficient. It uses about twice as less memory at the cost of slower value retrieval.
  • allow_dictionaries user setting that works similar to allow_databases.
  • HTTP source new attributes: credentials and http-headers.
Operations / Monitoring
  • system.metric_log table which stores values of system.events and system.metrics with specified time interval.

  • system.text_log in order to store ClickHouse logs to itself

  • Support for detached parts removal:

    ALTER TABLE &lt;table_name&gt; DROP DETACHED PART '&lt;part_id&gt;'

  • MergeTree now has an additional option ttl_only_drop_parts (disabled by default) to avoid partial pruning of parts, so that they dropped completely when all the rows in a part are expired.

  • Added miscellaneous function getMacro(name) that returns String with the value of corresponding <macros> from configuration file on current server where the function is executed.

New metrics & events
  • system.metrics: DistributedFilesToInsert, GlobalThread, GlobalThreadActive, LocalThread, LocalThreadActive
  • system.events: Merge
New system tables
  • system.disks
  • system.storage_policies
New columns in system tables
  • system.clusters: errors_count, estimated_recovery_time
  • system.collations: language
  • system.parts: disk_name
  • system.parts_columns: disk_name
  • system.processes: os_thread_ids
  • system.query_log: os_thread_ids
  • system.tables: storage_policy
system.merge_tree_settings changed/added
Name Default Description
concurrent_part_removal_threshold 100 Activate concurrent part removal (see ‘max_part_removal_threads’) only if the number of inactive data parts is at least this.
max_part_loading_threads auto(6) The number of threads to load data parts at startup.
max_part_removal_threads auto(6) The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).
storage_policy default Name of storage disk policy
ttl_only_drop_parts 0 Only drop altogether the expired parts and not partially prune them.
system.settings changed/added
Name Default Description
allow_drop_detached 0 Allow ALTER TABLE … DROP DETACHED PARTITION … queries
allow_experimental_live_view 0 Enable LIVE VIEW. Not mature enough.
allow_introspection_functions 0 Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.
any_join_distinct_right_table_keys 0 (was 1) Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for ’t1 ANY LEFT JOIN t2’ and ’t2 ANY RIGHT JOIN t1’. ANY RIGHT JOIN needs one-to-many keys mapping to be consistent with LEFT one.
asterisk_left_columns_only removed
connection_pool_max_wait_ms 0 The wait time when connection pool is full.
default_max_bytes_in_join 100000000 Maximum size of right-side table if limit’s required but max_bytes_in_join is not set.
distributed_directory_monitor_max_sleep_time_ms 30000 Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.
distributed_replica_error_cap 1000 Max number of errors per replica, prevents piling up incredible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.
distributed_replica_error_half_life 60 Time period reduces replica error counter by 2 times.
format_custom_escaping_rule Escaped Field escaping rule (for CustomSeparated format)
format_custom_field_delimiter Delimiter between fields (for CustomSeparated format)
format_custom_result_after_delimiter Suffix after result set (for CustomSeparated format)
format_custom_result_before_delimiter Prefix before result set (for CustomSeparated format)
format_custom_row_after_delimiter Delimiter after field of the last column (for CustomSeparated format)
format_template_resultset Path to file which contains format string for result set (for Template format)
format_template_row Path to file which contains format string for rows (for Template format)
format_template_rows_between_delimiter Delimiter between rows (for Template format)
input_format_csv_unquoted_null_literal_as_null 0 Consider unquoted NULL literal as N
input_format_null_as_default 0 For text input formats initialize null fields with default values if data type of this field is not nullable
input_format_tsv_empty_as_default 0 Treat empty fields in TSV input as default values.
input_format_values_accurate_types_of_literals 1 For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.
input_format_values_deduce_templates_of_expressions 0 For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.
joined_subquery_requires_alias 0 Force joined subqueries to have aliases for correct name qualification.
kafka_max_wait_ms 5000 The wait time for reading from Kafka before retry.
live_view_heartbeat_interval 15 The heartbeat interval in seconds to indicate live query is alive.
max_http_get_redirects 0 Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services.
max_live_view_insert_blocks_before_refresh 64 Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed.
min_free_disk_space_for_temporary_data 0 The minimum disk space to keep while writing temporary data used in external sorting and aggregation.
optimize_read_in_order 1 Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.
partial_merge_join 0 Use partial merge join instead of hash join for LEFT and INNER JOINs.
partial_merge_join_optimizations 0 Enable optimizations in partial merge join
partial_merge_join_rows_in_left_blocks 10000 Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.
partial_merge_join_rows_in_right_blocks 10000 Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.
query_profiler_cpu_time_period_ns 1000000000 (was 0) Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.
query_profiler_real_time_period_ns 1000000000 (was 0) Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.
replace_running_query_max_wait_ms 5000 The wait time for running query with the same query_id to finish when setting ‘replace_running_query’ is active.
s3_min_upload_part_size 536870912 The minimum size of part to upload during multipart upload to S3.
temporary_live_view_timeout 5 Timeout after which temporary live view is deleted.

Altinity Stable Release for ClickHouse 19.16.19.85

Released 2020-04-20

Description

This is a combined release of 19.16.19.65, upgraded to 19.16.19.85 after review.

Major New Features

enable_scalar_subquery_optimization setting is disabled by default. It is required in order to perform rolling upgrades to 20.x versions.

Bug Fixes
  • Fixed a bug with non-Date/DateTime columns not being allowed in TTL expressions
  • Fixed a bug in the Kafka engine that sometimes prevented committing a message back to the broker if the broker was temporarily unavailable
  • Fixed a bug in which the boolean functions ‘or’ and ‘and’ might return incorrect results when called on more than 10 arguments if some of them are NULL
  • Fixed server side certificates in docker image
  • Fixed a possible race condition between insert quorum and drop partition
  • Volume selection in merge process could move data back to the first volume in storage policy in some cases
  • Mutations that did not work if insert_quorum has been used for the table
  • Deduplication logic that ignored dependent materialized views if a duplicate for the main table has been detected. The fix is turned off by default and can be enabled with a new deduplicate_blocks_in_dependent_materialized_views setting.
  • Kafka engine could result in data losses when ZooKeeper is temporarily not available
  • Kafka engine did not allow users to drop a table with incorrect Kafka topic
  • Kafka engine did not allow users to use subqueries in attached materialized views
  • ALTER TABLE MODIFY CODEC default expression and a codec are specified

Altinity Stable Release for ClickHouse 19.16.12.49

Released 2020-02-07

Description

Improved Kafka engine reliability, performance and usability.

Bug Fixes
  • When there is a rebalance of partitions between consumers, occasional data duplicates and losses were possible.
  • When data was polled from several partitions with one poll and committed partially, occasional data losses were possible.
  • Block size threshold (‘kafka_max_block_size’ setting) now triggers block flush correctly that reduces the chance of data loss under high load.
  • When new columns are added to the Kafka pipeline (Kafka engine -> Materialized View -> Table) it was previously not possible to alter the destination tables first with default value and then alter MV in order to minimize downtime. The problem was not related directly to Kafka, but general implementation of materialized views. It is fixed now.
  • Few other minor problems have been addressed as well.

Since ClickHouse now respects the ‘kafka_max_block_size’ setting that defaults to 65535, we recommend increasing it to the bigger values for high volume streaming. Setting it to 524288 or 1048576 may increase consumer throughput up to 20%.

Known Issues

The ‘kafka_max_block_size’ defaults to 65535. We recommend increasing it to the bigger values for high volume streaming. Setting it to 524288 or 1048576 may increase consumer throughput up to 20%.

References

V19.13

Altinity Stable Release for ClickHouse 19.13.7.57

Released 2019-11-28

Description

This is a minor upgrade over the previous release.

Major New Features
  • CREATE TABLE AS TABLE FUNCTION statement that allows to create tables from external data automatically, e.g. from mysql table.
  • COLUMN('regexp') macro function as a generalization of SELECT * queries.
Bug Fixes
  • Fixed security vulnerability in url() function.
  • Fixed security vulnerability related to Zookeeper.
  • Fixed vulnerabilities in compression codecs.
  • Fixed corner case overflows in DoubleDelta codec.
Known Issues
  • ANY INNER|RIGHT|FULL JOIN is deprecated! While we like how it worked before, there are plans to change the behavior completely. We are still discussing with the development team how we can preserve the old behavior. For now, in order to preserve the old behavior and safely upgrade to newer versions in the future one needs to enable any_join_distinct_right_table_keys setting!
  • The setting input_format_defaults_for_omitted_fields is on by default. It enables calculation of complex default expressions for omitted fields in JSONEachRow and CSV* formats. Inserts to Distributed tables need this setting to be the same across the cluster.
    • You need to set input_format_defaults_for_omitted_fields across the cluster before the upgrade if rolling upgrade is performed.

References

V19.11

Altinity Stable Release for ClickHouse 19.11.8

Released 2019-09-03

Description

Updates including improved Kafka support.

Major New Features
  • Multi-table JOINs with standard SQL syntax (some limitations are left: only equi-JOIN, limited condition pushdown support)
  • Secondary indices - allow skipping data blocks without reading data: min/max filter, n-grams or token-based bloom filter
  • TTL expressions for columns and tables to remove old data automatically
  • Unique ASOF JOIN - join to the most recent value known
  • Per-column custom compression codecs (Delta, T64, DoubleDelta, Gorilla): fine-tune compression for particular columns
  • Roaring Bitmaps - use bitmaps for different on-set operations (calculate cardinality, unions, intersects, etc)
  • Machine-learning functions: evalMLMethod, simpleLinearRegression, stochasticLinearRegression, stochasticLogisticRegression
  • Predefined per-row filter expressions for tables, which enable simple row-level security rules
  • Adaptive index granularity (setting index_granularity_bytes) - useful for tables having rows with relatively big size (>1280 bytes)
  • Advanced text processing functions
    • Multiple substring search: multiSearch(Any|FirstPosition|FirstIndex|AllPositions)(CaseInsensitive)?(UTF8)?
    • Multiple regex search (hyperscan-powered): multiMatch(Any|AnyIndex)
    • Fuzzy regex match (hyperscan-powered): multiFuzzyMatch(Any|AnyIndex)
    • N-gram distance for fuzzy string comparison and search (similar to q-gram metrics in R language): ngramDistance(CaseInsensitive)?(UTF8)?, ngramSearch(CaseInsensitive)?(UTF8)?
  • HDFS read/write access
  • New JSON processing functions - high performance & compliant with JSON standard
  • IPv4 and IPv6 data types
  • New formats
    • Protobuf — now fully supported with input and output plus nested data structures
    • Parquet
    • RowBinaryWithNamesAndTypes
    • JSONEachRow and TSKV - now support default expressions for missing fields (Check input_format_defaults_for_omitted_fields)
    • TSVWithNames/CSVWithNames - column order can now be determined from file header (Check input_format_with_names_use_header parameter).
  • SQL statements with bind parameters
  • Improved MySQL integration:
    • new database engine to access all the tables in remote MySQL server
    • support for MySQL wire protocol, allowing to connect to ClickHouse using MySQL clients.
Known issues
  • All released 19.x versions have had some problems with Kafka engine implementation due to a full re-write of Kafka support. In 19.11.8 Kafka is working much better than previous 19.x releases. However, there are still some corner cases that can lead to data duplication in certain scenarios, for example, in the event of ClickHouse server restart. Those issues will be addressed soon.

  • Adaptive granularity is enabled by default, and index_granularity_bytes is set to 10Mb. This feature uses a different data format, and interoperability between old and new format has some issues. So if you’re upgrading your cluster from an older version, consider disabling it before the upgrade by putting the following fragment in your config.xml:

    <merge_tree>
      <index_granularity_bytes>0</index_granularity_bytes>
    </merge_tree></code></pre>
    

    After upgrade, you can choose any convenient time to turn adaptive granularity on. In general, it’s a cool feature and especially useful when you have rows of size > 1Kb in your tables. If you are doing a new installation, please leave the default setting value as is. The adaptive granularity feature is very nice and useful.

  • enable_optimize_predicate_expression is now enabled by default. It`s possible you may have some issues when a condition passed to subselect leads to some suboptimal / undesired effects. If this happens please report the issue and disable the feature for that select (or globally).

  • Secondary indices are maturing but still considered as experimental. There is at least one severe bug: when a mutation is executed with a condition on the column with secondary index - it can affect more rows than expected. Please be careful with DELETE, or upgrade to 19.13 (see issue #6224).

  • Some users have reported problems with ODBC data sources after upgrade. In most cases these were misconfigurations. Nevertheless, please do canary/staging updates and check how your ODBC connections work before moving to production.

Upgrade Notes

Backward compatibility issues
  • Due to update of LZ4 library the new ClickHouse version writes parts which are not binary equivalent to those written with older versions. That makes it problematic to update only one replica. Leaving the cluster in such a state for a long period of time time will work but may lead to excessive parts copying between nodes due to checksum mismatches.
  • There is a new setting max_partitions_per_insert_block with default value 100. If the inserted block contains a larger number of partitions, an exception is thrown. Set it to 0 if you want to remove the limit (not recommended).
  • If you are using unexpected low cardinality combinations like LowCardinality(UInt8), the new version will prevent you from doing so. if you really know what you are doing check allow_suspicious_low_cardinality_types and set it to 1.
  • This release adds max_parts_in_total setting for MergeTree family of tables (default: 100 000). We hope your number of partitions is much lower than this limit. If necessary you can raise the value.
  • The system.dictionaries table has been changed. If you used it for monitoring purposes, you may need to change your scripts.
  • Dictionaries are loaded lazily by default. It means they have status NOT_LOADED until the first access.

Changes

New functions
  • New hash functions: xxHash32, xxHash64, gccMurmurHash, hiveHash, javaHash, CRC32
  • JSON processing functions: JSONExtract,JSONExtract(Bool|Float|Int|KeysAndValues|Raw|String|UInt),JSONHas,JSONKey,JSONLength,JSONType
  • Geospatial processing: geoToH3, geohashEncode, geohashDecode
  • IP address handling: IPv4CIDRtoIPv4Range, IPv6CIDRtoIPv6Range, toIPv4, toIPv6
  • New statistical aggregate functions: skewPop, skewSamp, kurtPop, kurtSamp
  • String functions: isValidUTF8, regexpQuoteMeta, trimBoth, trimLeft, trimRight, format, toValidUTF8
  • Encoding: tryBase64Decode, base64Decode, base64Encode
  • Array processing: arrayEnumerateDenseRanked, arrayEnumerateUniqRanked, flatten, arrayFlatten, arrayWithConstant,
  • Date/Time processing: toStartOfInterval, addQuarters, subtractQuarters, toIntervalQuarter, toStartOfTenMinutes
  • Numerical: roundDown, toDecimalOrZero, toDecimalOrNull
  • Dictionary: dictGet, dictGetOrDefault
  • Roaring Bitmaps: bitmapAndCardinality, bitmapAndnot, bitmapAndnotCardinality, bitmapBuild, bitmapCardinality, bitmapContains, bitmapHasAll, bitmapHasAny, bitmapOr, bitmapOrCardinality, bitmapToArray, bitmapXor, bitmapXorCardinality, groupBitmap
  • OS level introspection: filesystemAvailable, filesystemCapacity, filesystemFree, basename
  • New aggregate functions: boundingRatio, entropy, groupBitmap, sumMap(Filtered|FilteredWithOverflow|WithOverflow), topKWeighted, groupArrayMovingAvg,groupArrayMovingSum, timeSeriesGroupRateSum, timeSeriesGroupSum
  • SQL standard compatibility aliases added: left, right, trim, ltrim, rtrim, timestampadd, timestampsub, power, replace, ln, locate, mid
Other notable changes
  • Setting constraints which limit the possible range of setting value per user profile.
  • KILL MUTATION added
  • New aggregate function combinators: -Resample
  • Added new data type SimpleAggregateFunction - light aggregation for simple functions like any, anyLast, sum, min, max
  • Ability to use different sorting key (ORDER BY) and index (PRIMARY KEY). The sorting key can be longer than the index. You can alter ORDER BY at the moment of adding / removing the column
  • HTTP interface: brotli compression support, X-ClickHouse-Query-Id and X-ClickHouse-Summary headers in response, ability to cancel query on disconnect (check cancel_http_readonly_queries_on_client_close)
  • DFA-based implementation for functions sequenceMatch and sequenceCount in case the pattern doesn’t contain time
  • Back up all partitions at once with ALTER TABLE … FREEZE
  • Comments for a column in the table description
  • Join engine: new options join_use_nulls, max_rows_in_join, max_bytes_in_join, join_any_take_last_row and join_overflow_mode + joinGet function that allows you to use a Join type table like a dictionary.
  • /docker-entrypoint-initdb.d for database initialization in docker
  • Graphite rollup rules reworked.
  • Query settings in asynchronous INSERTs into Distributed tables are respected now
  • Hints when while user make a typo in function name or type in command line client
  • system.detached_parts table containing information about detached parts of MergeTree tables
  • Table function remoteSecure
  • Ability to write zookeeper part data in more compact form (use_minimalistic_part_header_in_zookeeper)
  • Ability to close MySQL connections after their usage in external dictionaries
  • Support RENAME operation for Materialized View
  • Non-blocking loading of external dictionaries

Kafka now supports SASL SCRAM authentication, new virtual columns _topic, _offset, _key are available, and a lot of other improvements.

Settings changed/added
Name Default Description
allow_experimental_cross_to_join_conversion 1 Convert CROSS JOIN to INNER JOIN if possible
allow_experimental_data_skipping_indices 0 If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.
allow_experimental_low_cardinality_type 1 (was 0) Obsolete setting, does nothing. Will be removed after 2019-08-13
allow_experimental_multiple_joins_emulation 1 Emulate multiple joins using subselects
allow_hyperscan 1 Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.
allow_simdjson 1 Allow using simdjson library in ‘JSON*’ functions if AVX2 instructions are available. If disabled rapidjson will be used.
allow_suspicious_low_cardinality_types 0 In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption.
cancel_http_readonly_queries_on_client_close 0 Cancel HTTP readonly queries when a client closes the connection without waiting for response.
check_query_single_value_result 1 Return check query result as single 1/0 value
enable_conditional_computation (was 0) DELETED
enable_optimize_predicate_expression 1 (was 0) If it is set to true, optimize predicates to subqueries.
enable_unaligned_array_join 0 Allow ARRAY JOIN with multiple arrays that have different sizes. When this setting is enabled, arrays will be resized to the longest one.
external_table_functions_use_nulls 1 If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only for ‘mysql’ table function.
idle_connection_timeout 3600 Close idle TCP connections after specified number of seconds.
input_format_defaults_for_omitted_fields 0 For input data calculate default expressions for omitted fields (it works for JSONEachRow format).
input_format_with_names_use_header 0 For TSVWithNames and CSVWithNames input formats this controls whether format parser is to assume that column data appear in the input exactly as they are specified in the header.
join_any_take_last_row 0 When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key. Allows you to overwrite old values in table with Engine=Join.
join_default_strictness ALL Set default strictness in JOIN query. Possible values: empty string, ‘ANY’, ‘ALL’. If empty, query without strictness will throw exception.
low_cardinality_allow_in_native_format 1 Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.
max_alter_threads auto The maximum number of threads to execute the ALTER requests. By default, it is determined automatically.
max_execution_speed 0 Maximum number of execution rows per second.
max_execution_speed_bytes 0 Maximum number of execution bytes per second.
max_partitions_per_insert_block 100 Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.
max_streams_multiplier_for_merge_tables 5 Request more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and is especially helpful when merged tables differ in size.
max_threads auto The maximum number of threads to execute the request. By default, it is determined automatically.
merge_tree_max_bytes_to_use_cache 2013265920 The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.)
merge_tree_min_bytes_for_concurrent_read 251658240 If at least as many bytes are read from one file, the reading can be parallelized.
merge_tree_min_bytes_for_seek 0 You can skip reading more than that number of bytes at the price of one seek per file.
min_count_to_compile_expression 3 The number of identical expressions before they are JIT-compiled
min_execution_speed_bytes 0 Minimum number of execution bytes per second.
network_compression_method LZ4 Allows you to select the method of data compression when writing.
optimize_skip_unused_shards 0 Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.
os_thread_priority 0 If non zero - set corresponding ‘nice’ value for query processing threads. Can be used to adjust query priority for OS scheduler.
output_format_parquet_row_group_size 1000000 Row group size in rows.
stream_poll_timeout_ms 500 Timeout for polling data from streaming storages.
tcp_keep_alive_timeout 0 The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes
MergeTree settings
Name Default Description
enable_mixed_granularity_parts 0 Enable parts with adaptive and non adaptive granularity
index_granularity_bytes 10485760 Approximate amount of bytes in single granule (0 - disabled).
write_final_mark 1 Write final mark after end of column (0 - disabled, do nothing if index_granularity_bytes=0)
max_parts_in_total 100000 If more than this number active parts in all partitions in total, throw ‘Too many parts …’ exception.
merge_with_ttl_timeout 86400 Minimal time in seconds, when merge with TTL can be repeated.
min_merge_bytes_to_use_direct_io 10737418240 (was 0) Minimal amount of bytes to enable O_DIRECT in merge (0 - disabled).
replicated_max_parallel_fetches_for_host 15 Limit parallel fetches from endpoint (actually pool size).
use_minimalistic_part_header_in_zookeeper 0 Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes (/columns and /checksums). This can dramatically reduce snapshot size in ZooKeeper. Before enabling check that all replicas support new format.

V18.14

Altinity Stable Release for ClickHouse 18.14.19

Released 2018-12-31

Description

Altinity Stable Release 18.14.19 is a minor bug-fixing release to the previous 18.14.15. ClickHouse memory consumption in general is decreased with this release.

Bug fixes

  • Fixed a bug with range_hashed dictionaries returning wrong results.
  • Fixed an error that caused messages “netlink: ‘…’: attribute type 1 has an invalid length” to be printed in Linux kernel log in some recent versions of Linux kernel.
  • Fixed segfault in function ’empty’ for FixedString arguments.
  • Fixed excessive memory allocation when using a large value of max_query_size.
  • Fixed cases when the ODBC bridge process did not terminate with the main server process.
  • Fixed synchronous insertion into the Distributed table with a columns list that differs from the column list of the remote table.
  • Fixed a rare race condition that could lead to a crash when dropping a MergeTree table.
  • Fixed a query deadlock in a case when query thread creation fails with the ‘Resource temporarily unavailable’ error.
  • Fixed parsing of the ENGINE clause when the CREATE AS table syntax was used and the ENGINE clause was specified before the AS table (the error resulted in ignoring the specified engine).
  • Fixed a segfault if the ‘max_temporary_non_const_columns’ limit was exceeded
  • Fixed a bug with databases being not correctly specified when executing DDL ON CLUSTER queries

References