Configuring Data Lake Catalogs

Working with Data Lakes in ClickHouse®

The ACM makes it easy to work with data lakes in ClickHouse®. Clicking the Data Lake Catalogs menu item lets you create a data lake in Altinity.Cloud or connect to an AWS Glue data lake.

Defining a database with an IcebergS3 database engine

The first time you try to create a data lake, you’ll need to set the allow_experimental_database_iceberg property:

Figure 1 - Iceberg database support not enabled

Clicking the button enables the property, although it may take a short while:

Figure 2 - Iceberg database support being enabled

You can click the button until support is enabled. Then you’ll see this dialog:

Figure 3 - Creating a database with a IcebergS3 engine

In this example, we’re creating a new database named maddie with a IcebergS3 engine. The text area at the bottom of the dialog changes to reflect your choices. Click the button to create your new database. After the connection is created, you’ll get a message listing the data the ACM found in your data lake catalog:

Figure 4 - Catalog is connected, and data from the catalog is now available

(If there isn’t any data in the catalog, you’ll get the message “No tables found in catalog so far.”)

When the catalog is connected, you can look at the Schema tab of the Cluster Explorer and see the tables in the ClickHouse cluster:

Figure 5 - The database table created from our data lake

We created a database named maddie in Figure 3 above; here we can see the table in that database.

Connecting to an AWS Glue data lake

The first time you try to work with an AWS Glue data lake, you’ll need to set the allow_experimental_database_glue_catalog property:

Figure 6 - Glue catalog support not enabled

Clicking the button enables the property, although it may take a short while:

Figure 7 - Glue catalog support being enabled

You can click the button until support is enabled. Then you’ll see this dialog:

Figure 8 - Connecting to an AWS Glue data lake

In Figure 8, we’re connecting to an AWS Glue data lake and creating a database named sales that will give us access to the Glue catalog. In addition to the ClickHouse database name, select an AWS region and enter your access key and secret key. The text area at the bottom of the dialog changes to reflect your choices.

Click the button to create the connection and database. The ACM will use the metadata in the Glue catalog to create new tables; querying the ClickHouse tables will bring results from the Glue catalog. When the connection is complete, you’ll see a success message and a list of tables in your database:

Figure 9 - The tables created from the connection to the Glue catalog

Going to the Schema tab of the Cluster Explorer shows the tables created in the sales database based on the metadata in the Glue catalog:

Figure 10 - Tables available through the Glue catalog