Project Antalya Quick Start Guide

Getting up to speed with Project Antalya

Project Antalya delivers new features that make ClickHouse® even more powerful than before. In this guide we’ll show you how to make the most of those features.

EXCITING NEW FEATURE

As of version 26.3, Antalya has GA-level support for OAuth. This lets you manage access to your ClickHouse servers and data in a centralized way, coordinating access to ClickHouse and applications that use it without giving up the full power of ClickHouse’s RBAC.

See the OAuth documentation to learn about Antalya’s OAuth support and how to configure it. Now back to our regularly scheduled Quick Start Guide…

There are three concepts we’ll deal with in this guide:

Swarms - Swarms are pools of stateless ClickHouse clusters. With Project Antalya, ClickHouse can use a swarm to distribute the processing load of a query, giving you much faster query times. They can be spun up or down as needed, and they register (and unregister) themselves with Keeper automatically. And they can cut your compute costs significantly by running on spot instances, which Amazon says can be up to 90% cheaper than regular instances.
Data Lakes - Project Antalya implements data lakes that use Iceberg as their table format, store data as columns in Parquet, and host everything on inexpensive, S3-compatible storage. Most importantly, Project Antalya’s data lakes can be used by multiple applications. Analytics workloads with ClickHouse, AI applications, and batch jobs can all use the same Iceberg catalogs, eliminating silos of data and greatly reducing your storage costs.
Hybrid Tables - Project Antalya delivers the Hybrid table engine, which allows you to divide a dataset between block storage and object storage. Putting your lesser-used data into object storage can have significant cost savings. And even though your data is stored in different places, hybrid tables let you analyze all of your data with a single query.

If you’d like a more in-depth look at these topics, see the Project Antalya concepts guide.

Throughout this guide we’ll look at two different datasets:

The AWS Public Blockchain dataset - This has 15+ years of data, with thousands of Parquet files, one for each day. It’s a great way to show the benefits of swarm clusters, since we can use multiple threads to read and process those thousands of files in parallel.
The New York Taxi and Limousine Commission dataset, which has more than fifteen years’ worth of data on taxi rides. This is a great dataset to illustrate the power of Hybrid tables. Analytics against this data tend to focus on time-based queries. If there’s a clear line between hot data and cold data, the ability to move cold data to much cheaper object storage yet still query all our data with a single SQL statement has substantial benefits.

Project Antalya Quick Start Guide

EXCITING NEW FEATURE

Creating Swarm Clusters

Querying Data with Swarms

Working with Data Lakes

Working with Hybrid Tables

Bringing it all Together

Get in touch with ClickHouse experts.

Project Antalya Quick Start Guide

EXCITING NEW FEATURE

Creating Swarm Clusters

Querying Data with Swarms

Working with Data Lakes

Working with Hybrid Tables

Bringing it all Together