Altinity Stable for ClickHouse 19.11.8
Description
Updates including improved Kafka support.
Major New Features
- Multi-table JOINs with standard SQL syntax (some limitations are left: only equi-JOIN, limited condition pushdown support)
- Secondary indices - allow skipping data blocks without reading data: min/max filter, n-grams or token-based bloom filter
- TTL expressions for columns and tables to remove old data automatically
- Unique ASOF JOIN - join to the most recent value known
- Per-column custom compression codecs (Delta, T64, DoubleDelta, Gorilla): fine-tune compression for particular columns
- Roaring Bitmaps - use bitmaps for different on-set operations (calculate cardinality, unions, intersects, etc)
- Machine-learning functions: evalMLMethod, simpleLinearRegression, stochasticLinearRegression, stochasticLogisticRegression
- Predefined per-row filter expressions for tables, which enable simple row-level security rules
- Adaptive index granularity (setting
index_granularity_bytes
) - useful for tables having rows with relatively big size (>1280 bytes) - Advanced text processing functions
- Multiple substring search: multiSearch(Any|FirstPosition|FirstIndex|AllPositions)(CaseInsensitive)?(UTF8)?
- Multiple regex search (hyperscan-powered): multiMatch(Any|AnyIndex)
- Fuzzy regex match (hyperscan-powered): multiFuzzyMatch(Any|AnyIndex)
- N-gram distance for fuzzy string comparison and search (similar to q-gram metrics in R language): ngramDistance(CaseInsensitive)?(UTF8)?, ngramSearch(CaseInsensitive)?(UTF8)?
- HDFS read/write access
- New JSON processing functions - high performance & compliant with JSON standard
- IPv4 and IPv6 data types
- New formats
- Protobuf — now fully supported with input and output plus nested data structures
- Parquet
- RowBinaryWithNamesAndTypes
- JSONEachRow and TSKV - now support default expressions for missing fields (Check
input_format_defaults_for_omitted_fields
) - TSVWithNames/CSVWithNames - column order can now be determined from file header (Check
input_format_with_names_use_header
parameter).
- SQL statements with bind parameters
- Improved MySQL integration:
- new database engine to access all the tables in remote MySQL server
- support for MySQL wire protocol, allowing to connect to ClickHouse using MySQL clients.
Known issues
-
All released 19.x versions have had some problems with Kafka engine implementation due to a full re-write of Kafka support. In 19.11.8 Kafka is working much better than previous 19.x releases. However, there are still some corner cases that can lead to data duplication in certain scenarios, for example, in the event of ClickHouse server restart. Those issues will be addressed soon.
-
Adaptive granularity is enabled by default, and index_granularity_bytes is set to 10Mb. This feature uses a different data format, and interoperability between old and new format has some issues. So if you’re upgrading your cluster from an older version, consider disabling it before the upgrade by putting the following fragment in your config.xml:
<merge_tree> <index_granularity_bytes>0</index_granularity_bytes> </merge_tree></code></pre>
After upgrade, you can choose any convenient time to turn adaptive granularity on. In general, it’s a cool feature and especially useful when you have rows of size > 1Kb in your tables. If you are doing a new installation, please leave the default setting value as is. The adaptive granularity feature is very nice and useful.
-
enable_optimize_predicate_expression
is now enabled by default. It`s possible you may have some issues when a condition passed to subselect leads to some suboptimal / undesired effects. If this happens please report the issue and disable the feature for that select (or globally). -
Secondary indices are maturing but still considered as experimental. There is at least one severe bug: when a mutation is executed with a condition on the column with secondary index - it can affect more rows than expected. Please be careful with DELETE, or upgrade to 19.13 (see issue #6224).
-
Some users have reported problems with ODBC data sources after upgrade. In most cases these were misconfigurations. Nevertheless, please do canary/staging updates and check how your ODBC connections work before moving to production.
Upgrade Notes
Backward compatibility issues
- Due to update of LZ4 library the new ClickHouse version writes parts which are not binary equivalent to those written with older versions. That makes it problematic to update only one replica. Leaving the cluster in such a state for a long period of time time will work but may lead to excessive parts copying between nodes due to checksum mismatches.
- There is a new setting
max_partitions_per_insert_block
with default value 100. If the inserted block contains a larger number of partitions, an exception is thrown. Set it to 0 if you want to remove the limit (not recommended). - If you are using unexpected low cardinality combinations like LowCardinality(UInt8), the new version will prevent you from doing so. if you really know what you are doing check
allow_suspicious_low_cardinality_types
and set it to 1. - This release adds
max_parts_in_total
setting for MergeTree family of tables (default: 100 000). We hope your number of partitions is much lower than this limit. If necessary you can raise the value. - The
system.dictionaries
table has been changed. If you used it for monitoring purposes, you may need to change your scripts. - Dictionaries are loaded lazily by default. It means they have status
NOT_LOADED
until the first access.
Changes
New functions
- New hash functions: xxHash32, xxHash64, gccMurmurHash, hiveHash, javaHash, CRC32
- JSON processing functions: JSONExtract,JSONExtract(Bool|Float|Int|KeysAndValues|Raw|String|UInt),JSONHas,JSONKey,JSONLength,JSONType
- Geospatial processing: geoToH3, geohashEncode, geohashDecode
- IP address handling: IPv4CIDRtoIPv4Range, IPv6CIDRtoIPv6Range, toIPv4, toIPv6
- New statistical aggregate functions: skewPop, skewSamp, kurtPop, kurtSamp
- String functions: isValidUTF8, regexpQuoteMeta, trimBoth, trimLeft, trimRight, format, toValidUTF8
- Encoding: tryBase64Decode, base64Decode, base64Encode
- Array processing: arrayEnumerateDenseRanked, arrayEnumerateUniqRanked, flatten, arrayFlatten, arrayWithConstant,
- Date/Time processing: toStartOfInterval, addQuarters, subtractQuarters, toIntervalQuarter, toStartOfTenMinutes
- Numerical: roundDown, toDecimalOrZero, toDecimalOrNull
- Dictionary: dictGet, dictGetOrDefault
- Roaring Bitmaps: bitmapAndCardinality, bitmapAndnot, bitmapAndnotCardinality, bitmapBuild, bitmapCardinality, bitmapContains, bitmapHasAll, bitmapHasAny, bitmapOr, bitmapOrCardinality, bitmapToArray, bitmapXor, bitmapXorCardinality, groupBitmap
- OS level introspection: filesystemAvailable, filesystemCapacity, filesystemFree, basename
- New aggregate functions: boundingRatio, entropy, groupBitmap, sumMap(Filtered|FilteredWithOverflow|WithOverflow), topKWeighted, groupArrayMovingAvg,groupArrayMovingSum, timeSeriesGroupRateSum, timeSeriesGroupSum
- SQL standard compatibility aliases added: left, right, trim, ltrim, rtrim, timestampadd, timestampsub, power, replace, ln, locate, mid
Other notable changes
- Setting constraints which limit the possible range of setting value per user profile.
- KILL MUTATION added
- New aggregate function combinators: -Resample
- Added new data type SimpleAggregateFunction - light aggregation for simple functions like any, anyLast, sum, min, max
- Ability to use different sorting key (ORDER BY) and index (PRIMARY KEY). The sorting key can be longer than the index. You can alter ORDER BY at the moment of adding / removing the column
- HTTP interface: brotli compression support, X-ClickHouse-Query-Id and X-ClickHouse-Summary headers in response, ability to cancel query on disconnect (check
cancel_http_readonly_queries_on_client_close
) - DFA-based implementation for functions sequenceMatch and sequenceCount in case the pattern doesn’t contain time
- Back up all partitions at once with ALTER TABLE … FREEZE
- Comments for a column in the table description
- Join engine: new options
join_use_nulls
,max_rows_in_join
,max_bytes_in_join
,join_any_take_last_row
andjoin_overflow_mode
+ joinGet function that allows you to use a Join type table like a dictionary. /docker-entrypoint-initdb.d
for database initialization in docker- Graphite rollup rules reworked.
- Query settings in asynchronous INSERTs into Distributed tables are respected now
- Hints when while user make a typo in function name or type in command line client
- system.detached_parts table containing information about detached parts of MergeTree tables
- Table function remoteSecure
- Ability to write zookeeper part data in more compact form (
use_minimalistic_part_header_in_zookeeper
) - Ability to close MySQL connections after their usage in external dictionaries
- Support RENAME operation for Materialized View
- Non-blocking loading of external dictionaries
Kafka now supports SASL SCRAM authentication, new virtual columns _topic, _offset, _key are available, and a lot of other improvements.
Settings changed/added
Name | Default | Description |
---|---|---|
allow_experimental_cross_to_join_conversion | 1 | Convert CROSS JOIN to INNER JOIN if possible |
allow_experimental_data_skipping_indices | 0 | If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries. |
allow_experimental_low_cardinality_type | 1 (was 0) | Obsolete setting, does nothing. Will be removed after 2019-08-13 |
allow_experimental_multiple_joins_emulation | 1 | Emulate multiple joins using subselects |
allow_hyperscan | 1 | Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage. |
allow_simdjson | 1 | Allow using simdjson library in ‘JSON*’ functions if AVX2 instructions are available. If disabled rapidjson will be used. |
allow_suspicious_low_cardinality_types | 0 | In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption. |
cancel_http_readonly_queries_on_client_close | 0 | Cancel HTTP readonly queries when a client closes the connection without waiting for response. |
check_query_single_value_result | 1 | Return check query result as single 1/0 value |
enable_conditional_computation | (was 0) | DELETED |
enable_optimize_predicate_expression | 1 (was 0) | If it is set to true, optimize predicates to subqueries. |
enable_unaligned_array_join | 0 | Allow ARRAY JOIN with multiple arrays that have different sizes. When this setting is enabled, arrays will be resized to the longest one. |
external_table_functions_use_nulls | 1 | If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only for ‘mysql’ table function. |
idle_connection_timeout | 3600 | Close idle TCP connections after specified number of seconds. |
input_format_defaults_for_omitted_fields | 0 | For input data calculate default expressions for omitted fields (it works for JSONEachRow format). |
input_format_with_names_use_header | 0 | For TSVWithNames and CSVWithNames input formats this controls whether format parser is to assume that column data appear in the input exactly as they are specified in the header. |
join_any_take_last_row | 0 | When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key. Allows you to overwrite old values in table with Engine=Join. |
join_default_strictness | ALL | Set default strictness in JOIN query. Possible values: empty string, ‘ANY’, ‘ALL’. If empty, query without strictness will throw exception. |
low_cardinality_allow_in_native_format | 1 | Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query. |
max_alter_threads | auto | The maximum number of threads to execute the ALTER requests. By default, it is determined automatically. |
max_execution_speed | 0 | Maximum number of execution rows per second. |
max_execution_speed_bytes | 0 | Maximum number of execution bytes per second. |
max_partitions_per_insert_block | 100 | Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception. |
max_streams_multiplier_for_merge_tables | 5 | Request more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and is especially helpful when merged tables differ in size. |
max_threads | auto | The maximum number of threads to execute the request. By default, it is determined automatically. |
merge_tree_max_bytes_to_use_cache | 2013265920 | The maximum number of rows per request, to use the cache of uncompressed data. If the request is large, the cache is not used. (For large queries not to flush out the cache.) |
merge_tree_min_bytes_for_concurrent_read | 251658240 | If at least as many bytes are read from one file, the reading can be parallelized. |
merge_tree_min_bytes_for_seek | 0 | You can skip reading more than that number of bytes at the price of one seek per file. |
min_count_to_compile_expression | 3 | The number of identical expressions before they are JIT-compiled |
min_execution_speed_bytes | 0 | Minimum number of execution bytes per second. |
network_compression_method | LZ4 | Allows you to select the method of data compression when writing. |
optimize_skip_unused_shards | 0 | Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key. |
os_thread_priority | 0 | If non zero - set corresponding ‘nice’ value for query processing threads. Can be used to adjust query priority for OS scheduler. |
output_format_parquet_row_group_size | 1000000 | Row group size in rows. |
stream_poll_timeout_ms | 500 | Timeout for polling data from streaming storages. |
tcp_keep_alive_timeout | 0 | The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes |
MergeTree settings
Name | Default | Description |
---|---|---|
enable_mixed_granularity_parts | 0 | Enable parts with adaptive and non adaptive granularity |
index_granularity_bytes | 10485760 | Approximate amount of bytes in single granule (0 - disabled). |
write_final_mark | 1 | Write final mark after end of column (0 - disabled, do nothing if index_granularity_bytes=0) |
max_parts_in_total | 100000 | If more than this number active parts in all partitions in total, throw ‘Too many parts …’ exception. |
merge_with_ttl_timeout | 86400 | Minimal time in seconds, when merge with TTL can be repeated. |
min_merge_bytes_to_use_direct_io | 10737418240 (was 0) | Minimal amount of bytes to enable O_DIRECT in merge (0 - disabled). |
replicated_max_parallel_fetches_for_host | 15 | Limit parallel fetches from endpoint (actually pool size). |
use_minimalistic_part_header_in_zookeeper | 0 | Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes (/columns and /checksums). This can dramatically reduce snapshot size in ZooKeeper. Before enabling check that all replicas support new format. |
Feedback
Was this page helpful?
Glad to hear it!
Sorry to hear that. We'll track this issue and see how we can improve.