This is the multi-page printable view of this section. Click here to print.
Altinity Stable for ClickHouse 20.3
1 - Altinity Stable for ClickHouse 20.3.19.4
Description
The release includes several dozen bug fixes from the previous release.
Major New Features
- Server startup time has been reduced by parsing metadata in parallel.
- Improved performance of primary key analysis for LowCardinality columns. That was a performance regression introduced in 20.3.
- Improved performance of queries with large tuples and tuples used in primary key.
Bug Fixes
- Fixed a bug that prevented attaching Materialized Views to system tables.
- Fixed incorrect behavior of
if
function with NULLs. - Fixed segfaults in rare cases.
- Fixed a bug that prevented predicate pushdown for queries using
WITH
clause. - Fixed
SIGSEGV
in Kafka engine when broker is unavailable. - Fixed a bug leading to
block structure mismatch
error for queries withUNION
andJOIN
. - Fixed TTL processing logic to process all partitions in one run.
- Fixed a bug of parsing of row policies from configuration files that could result in missing policies sometimes.
- Fixed a bug with
ALTER TABLE UPDATE
could produce incorrect results when updating Nullable columns. - Fixed a bug with codecs not working properly for MergeTree compact parts.
References
Updated RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used for DEB packages but check version numbers carefully.
2 - Altinity Stable for ClickHouse 20.3.12.112
Description
This release is a significant step forward since the previous Altinity Stable release 19.16.19.85. It includes 1203 pull requests from 171 contributors.
Major New Features
- TTL moves. See the Putting Things Where They Belong Using New TTL Moves article. While we consider this feature to be production ready, we have a roadmap to further improve it. TTL moves have been developed by Altinity.
- Dictionary DDLs that many ClickHouse users dreamed of! See the ClickHouse Dictionaries: Reloaded! article that highlights this feature.
- New DateTime64 datatype with configurable precision up to nanoseconds. The feature is in beta state and more functions for usability will follow in next releases. It has also been developed by Altinity.
- Joins have been improved a lot, including SEMI/ANTI JOIN, experimental merge join and other changes. See the join_algorithm setting below for more info on merge joins.
- A new compact format for MergeTree tables that store all columns in one file. It improves performance of small inserts. The usual format where every column is stored separately is now called “wide.” Compact format is disabled by default. See the min_bytes_for_wide_part and min_rows_for_wide_part settings.
- A built-in Prometheus exporter endpoint for ClickHouse monitoring statistics
- Porting some functionality of the H3 library — A Hexagonal Hierarchical Geospatial Indexing System
- ALTER MODIFY/DROP are currently implemented as mutations for ReplicatedMergeTree* engine family. Now ALTER commands block only at the metadata update stage, and don’t block after that.
- Import/export gzip-compressed files directly for file based storage engines and table functions: File, URL, HDFS and S3.
- Server improvements across different server functions. For example, performance of the Kafka Engine has been improved, parallel
INSERT
is now possible (see max_insert_threads setting), etc.
Upgrade Notes
- Replication fetch protocol has been changed in this release. If only one replica is upgraded in a shard, replication may get stuck with
unknown protocol version
error until all replicas are upgraded. 19.16.17.80 and 19.16.19.85 contain a special compatibility fix that allows smooth upgrade without replication downtime. - If enable_optimize_predicate_expression is turned on, it may result in incorrect data when only part of the cluster is upgraded. Please turn it off before the upgrade, and turn it back on afterwards.
- There were some optimizations that are incompatible between versions, so we recommend disabling enable_scalar_subquery_optimization before the upgrade. This setting turns off new scalar subquery optimizations and was backported specifically in order to facilitate smooth upgrades.
- There were some optimizations around distributed query execution that may result in incorrect data if part of the cluster is upgraded. We recommend disabling distributed_aggregation_memory_efficient while upgrading, and turn it on afterwards.
Backward Incompatible Changes
The following changes are backward incompatible and require user attention.
- ALTER on ReplicatedMergeTree is not compatible with previous versions.20.3 creates a different metadata structure in ZooKeeper for ALTERs. Earlier versions do not understand the format and cannot proceed with their replication queue.
- The format of replication log entries for mutation commands has changed. You have to wait for old mutations to process before installing the new version.
- 20.3 requires an alias for every subquery participating in a join by default. Set
joined_subquery_requires_alias=0
in order to keep the previous behavior. - ANY JOIN logic has been changed. To upgrade without changes in behavior, you need to add
SETTINGS any_join_distinct_right_table_keys = 1
to Engine Join tables metadata or recreate these tables after upgrade. - Functions indexHint, findClusterValue, findClusterIndex were removed
- Settings merge_tree_uniform_read_distribution, allow_experimental_cross_to_join_conversion, allow_experimental_multiple_joins_emulation are deprecated and ignored
Other Important Notes
- If you encounter
[Argument at index 0 for function dictGetDateTime must be a constant](https://github.com/ClickHouse/ClickHouse/issues/7798)
or a similar error message after upgrading, setenable_early_constant_folding=0
. - The new release adds a parallel parsing of input formats. While it should improve the performance of data loading in text formats, sometimes it may result in a slower load. Set
input_format_parallel_parsing=0
if you experience insert performance degradation. - The check that a query is performed fast enough is enabled by default. Queries can get an exception like this:
DB::Exception: Estimated query execution time (N seconds) is too long. Maximum: 12334.
. Settimeout_before_checking_execution_speed=0
to fix this problem. - Mutations containing non-deterministic functions, e.g. dictGet or joinGet are disabled by default. Set
allow_nondeterministic_mutations=1
to enable.
Also note that this release has a new query pipeline enabled by default (codename — Processors). This was a significant internal refactoring, and we believe all issues have been fixed. However, in the rare case that you face some weird query behavior or performance degradation, try disabling experimental_use_processors
and check if the problem goes away.
Changes
New formats
- Avro
- AvroConfluent
- JSONCompactEachRow
- JSONCompactEachRowWithNamesAndTypes
New or Improved Functions
- CRC32IEEE, CRC64
- New JSON functions.
- isValidJSON
- JSONExtractArrayRaw — very useful for parsing nested JSON structures
- New array functions.
- arrayCompact
- arrayFill
- arrayReverseFill
- arrayZip
- New geo functions.
- geoDistance
- greatCircleDistance
- H3 functions.
- h3EdgeAngle
- h3EdgeLengthM
- h3GetBaseCell
- h3GetResolution
- h3HexAreaM2
- h3IndexesAreNeighbors
- h3IsValid
- h3ToChildren
- h3ToParent
- h3ToString
- h3kRing
- stringToH3
- Functions for DateTime64.
- now64
- parseDateTime64BestEffort, parseDateTime64BestEffortOrNull, parseDateTime64BestEffortOrZero
- toDateTime64, toDateTime64OrNull, toDateTime64OrZero
- New aggregate functions:
- avgWeighted
- categoricalInformationValue
- groupArraySample
- Other.
- bitCount
- ifNotFinit.
- isConstant
- javaHashUTF16LE
- moduloOrZero
- randomPrintableASCII
- roundBankers
New table functions
- clusterAllReplicas — similar to cluster function but queries all replicas in the cluster
- generateRandom — allows users to generate random dataset with defined structure
- zeros, zeros_mt
New metrics and events
system.metrics:
- BackgroundMovePoolTask
- CacheDictionaryUpdateQueueBatches
- CacheDictionaryUpdateQueueKeys
- MemoryTrackingInBackgroundMoveProcessingPool
- MySQLConnection
- SendScalars
New system tables
metric_log
— stores system.metrics and system.events historyquotas
,quota_usage
— introspection to quotesrow_policies
— introspection to row security policiestrace_log
,stack_trace
— for low-level debugging purposezeros
,zeros_mt
— zerogenerators
, like system.numbers, but faster and returns zeros
New columns in system tables
mutations
: parts_to_do_namesparts
,parts_columns
: part_typeprocesses
: thread_ids replaces old columns thread_numbers and os_thread_idsquery_log
: exception_code, thread_ids replaces old columns thread_numbers and os_thread_idsreplicas
: zookeeper_exceptionsettings
: min, max, readonlytable_engines
: supports_settings, supports_skipping_indices, supports_sort_order, supports_ttl, supports_replication, supports_deduplication
system.merge_tree_settings added/changed
Settings | Old value | New value | Description |
---|---|---|---|
disable_background_merges | 0 | Disable background merges | |
merge_max_block_size | 8192 | How many rows in blocks should be formed for merge operations | |
min_bytes_for_wide_part | 0 | Minimal uncompressed size in bytes to create part in wide format instead of compact | |
min_rows_for_wide_part | 0 | Minimal number of rows to create part in wide format instead of compact | |
use_minimalistic_part_header_in_zookeeper | 0 | 1 | Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes. This setting was available for a year already, and many users have it enabled. |
system.settings added/changed
Setting | Old value | New value | Description |
---|---|---|---|
allow_experimental_alter_materialized_view_structure | 0 | Allow atomic alter on Materialized views. Work in progress | |
allow_experimental_data_skipping_indices | 0 | 1 | Data skipping indices are now enabled by default |
allow_nondeterministic_mutations | 0 | Allow non-deterministic functions in ALTER UPDATE/ALTER DELETE statements, e.g. with dictGet() functions | |
background_move_pool_size | 8 | Number of threads performing background moves for tables. Only has meaning at server startup | |
default_max_bytes_in_join | 100000000 | 1000000000 | Maximum size of right-side table if limit is required but max_bytes_in_join is not set |
enable_early_constant_folding | 0 | 1 | Enable query optimization where we analyze function and subqueries results and rewrite query if there’re constants there |
enable_scalar_subquery_optimization | 0 | 1 | If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once |
experimental_use_processors | 0 | 1 | Processors pipeline is now enabled by default |
force_optimize_skip_unused_shards | 1 | Throw an exception if unused shards cannot be skipped (1 – throw only if the table has the sharding key, 2 – always throw) | |
format_avro_schema_registry_url | For AvroConfluent format: Confluent Schema Registry URL | ||
input_format_parallel_parsing | 1 | Enable parallel parsing for some data formats | |
input_format_values_deduce_templates_of_expressions | 0 | 1 | For Values format: if the field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows |
join_algorithm | hash | Specify join algorithm: auto , hash , partial_merge , prefer_partial_merge . auto tries to change HashJoin to MergeJoin on the fly to avoid out of memory |
|
joined_subquery_requires_alias | 0 | 1 | Force joined subqueries and table functions to have aliases for correct name qualification |
max_insert_threads | 0 | The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see max_threads setting |
|
max_joined_block_size_rows | 65536 | Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited | |
max_parser_depth | 1000 | Maximum parser depth | |
memory_profiler_step | 0 | Every number of bytes the memory profiler will collect the allocating stack trace. The minimal effective step is 4 MiB (less values will work as clamped to 4 MiB). Zero means disabled memory profiler | |
min_bytes_to_use_mmap_io | 0 | The minimum number of bytes for reading the data with the mmap option during SELECT queries execution. 0 – disabled | |
min_chunk_bytes_for_parallel_parsing | 1048576 | The minimum chunk size in bytes, which each thread will parse in parallel | |
mutations_sync | 0 | Wait for synchronous execution of ALTER TABLE UPDATE/DELETE queries (mutations). 0 – execute asynchronously. 1 – wait current server. 2 – wait all replicas if they exist | |
optimize_if_chain_to_miltiif | 0 | Replace if(cond1, then1, if(cond2, …)) chains to multiIf | |
optimize_trivial_count_query | 1 | Process trivial SELECT count() FROM table query from metadata |
|
output_format_avro_codec | Compression codec used for output. Possible values: null , deflate , snappy |
||
output_format_avro_sync_interval | 16384 | Sync interval in bytes | |
output_format_csv_crlf_end_of_line | If it is set true, end of line in CSV format will be \r\n instead of \n | ||
output_format_tsv_crlf_end_of_line | If it is set true, end of line in TSV format will be \r\n instead of \n | ||
timeout_before_checking_execution_speed | 0 | 10 | Check that the speed is not too low after the specified time has elapsed |
use_compact_format_in_distributed_parts_names | 0 | Changes format of directories names for distributed table insert parts |
References
RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used as well but check version numbers carefully.