This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Integrating Superset with ClickHouse

How to connect Apache Superset with ClickHouse

Apache Superset is a powerful and convenient way to create dashboards to display information for your organization. This presents information that allows for quick visual understand of trends, where events are happening, and keeps your team on top of the massive amounts of data coming into your organization.

When combined with ClickHouse and its capacity to analyze terabytes of information in milliseconds, the integration of Superset and ClickHouse can reduce the sea of seemingly disconnected data flowing through into an understandable whole.

Integrating Superset with ClickHouse is broken down into three major steps:

  1. Install Superset.
  2. Connect Superset to ClickHouse.
  3. Connect Superset dashboard elements to ClickHouse queries.

1 - Install Superset

Superset installation instructions

Three methods of installing Superset are provided here:

  1. Direct installation: Directly download the Superset Python modules into an operating system.
  2. Docker based installation: Use Docker containers to provide Superset, whether as stand alone containers or part of the Kubernetes environment.
  3. Managed installation: Use managed services to provide Superset, and connect it to your ClickHouse.

Direct Installation

The following instructions are based an Ubuntu 20.04 environment. For information on how to install Superset on other operating systems or platforms like Docker or Kubernetes, see the Superset Introduction for links to different ways to install Superset.

Direct Installation Prerequisites

Before installing Superset on Ubuntu 20.04, the following packages must be installed, and will require administrative access on the machine.

Note that Superset requires Python, specifically Python 3.7.9 and above as of this document. These instructions are modified from the original to include installing python3-venv as part of the other prerequisites.

sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev python3-venv

Direct Installation Instructions

To install Superset directly into the Ubuntu 20.04 operating system:

  1. A virtual environment is highly recommended to keep all of the Superset python requirements contained. Use the following commands to create your Superset virtual environment, activate it, then upgrade pip:

    python3 -m venv clickhouse-sqlalchemy
    . clickhouse-sqlalchemy/bin/activate
    pip install --upgrade pip
    
  2. Install Superset with the following in your virtual environment. The command superset fab create-admin sets the admin user password, which will be used to login and update administrative level Superset settings:

    export FLASK_APP=superset
    pip install apache-superset
    superset db upgrade
    superset fab create-admin
    superset load_examples
    superset init 
    
  3. Install the ClickHouse SQLAlchemy - this allows Superset to communicate with ClickHouse. There are two drivers currently in development. As of this time, we recommend the clickhouse-sqlalchemy driver. To install clickhouse-sqlalchemy, use the pip command:

    pip install clickhouse-sqlalchemy
    
  4. For those who want to enable TLS communications with services like Altinity.Cloud, verify the versions:

    1. The clickhouse-driver version should be greater than 0.2.0.

    2. The clickhouse-sqlalchemy driver should be 0.1.6 or greater.

      pip freeze | grep clickhouse
      clickhouse-driver==0.2.0
      clickhouse-sqlalchemy==0.1.6
      

Starting Superset From Direct Install

Once Superset is installed, start it with the following command:

superset run -p 8088 --with-threads --reload --debugger

Access the Superset web interface with a browser at the host name, port 8088. For example: http://localhost:8088. The default administrative account will be admin, with the password set from the command superset fab create-admin as listed above.

Installation References

  • Details of the most common ways to install and run Superset are available on the Superset Introduction page.

Docker Based Installation

For organizations that prefer Docker based installations, or want to add Superset to a Kubernetes environment along with ClickHouse on Kubernetes, Superset can be installed with a few Docker commands.

The following instructions use the existing Apache Superset docker-compose file from the Github repository.

Docker Prerequisites

docker 19 and above, with either docker-compose version 1.29 and above, or the built in docker compose command that’s part of Docker 20.10 and above.

Docker Based Installation Instructions

To install Superset with a Docker Container:

  1. Download the Superset configuration files from the GitHub repository:

    git clone https://github.com/apache/superset
    
  2. Enter the superset directory and set the clickhouse-driver and the clickhouse-sqlalchemy version requirements:

    cd superset
    touch ./docker/requirements-local.txt
    echo "clickhouse-driver>=0.2.0" >> ./docker/requirements-local.txt
    echo "clickhouse-sqlalchemy>=0.1.6" >> ./docker/requirements-local.txt
    
  3. Run either docker-compose or docker compose to download the required images and start the Superset Docker containers:

    docker-compose -f docker-compose-non-dev.yml up
    

Starting Superset From Docker

Installing and starting Superset from Docker is the same docker-compose or docker compose installation command:

docker-compose -f docker-compose-non-dev.yml up

Access the Superset web interface with a browser at the server’s host name or IP address on port 8088. For example: http://localhost:8088. The default administrative account will be admin, with the password admin.

Preset Cloud

For organizations that prefer managed services, Preset Cloud offers Superset with with clickhouse-sqlalchemy. This can then be connected to an Altinity.Cloud account.

Production Installation Tips

The examples provided above are useful for development and testing environments. For full production environments installation and deployments, refer to the Superset Documentation site.

Some recommendations for organizations that want to install Superset in a production environment:

2 - Connect Superset to ClickHouse

How to connect Superset to ClickHouse

After Superset has been setup and the ClickHouse SQLAlchemy drivers installed, the two systems can be connected together. The following details how to connect Superset to an existing ClickHouse database.

For details on how to install ClickHouse, see either Altinity ClickHouse Stable Release Quick Start Guide, or use Altinity.Cloud.

Superset to ClickHouse Connection Instructions

Superset and ClickHouse Connection PreRequisites

  • Superset installed
  • clickhouse-driver 0.2.0 or above installed
  • clickhouse-sqlalchemy 0.1.6 or above installed

For more information, see the Install Superset page.

To connect Superset to ClickHouse once all prerequisites have been met:

  1. Login as a Superset administrative user. The default is admin.

  2. From the top menu, select Data->Databases to add a new database connection.

    Select Data -> Databases
  3. Select + Database.

    Select + Database
  4. Enter the following under the Connection tab:

    1. Database Name: The name your organization assigns the database. This does not have to match the actual name of the ClickHouse database.
    2. SQLAlchemy URI: The URI of the database connection in the SQLAlchemy format. The URI will be in the following format:
    {SQLAlchemy Connection Type}://{username}:{password}@{url}?{options}
    

    For example, to connect to a ClickHouse server, the following URI will be used:

    clickhouse+native://demo:demo@github.demo.trial.altinity.cloud/default?secure=true
    

    The SQLAlchemy format uses the following elements:

    1. SQLAlchemy Connection Type: Sets the driver to use to connect.
    2. username: The database account to use for authentication.
    3. password: The password for the database account being used.
    4. url: The full hostname to the database. This does not include tags such as HTTP or HTTPS - just the host name.
    5. options: Any additional options. For example, if the database connection is encrypted, then the option secure=true will be required. For more information on setting up ClickHouse network encryption, see the Network Hardening section in the ClickHouse Operations Guide.
  5. Select Test Connection to verify the connection.

    Select Test Connection
  6. If the connection works, then click Add.

    Select Add
  7. Once finished, the new database connection to ClickHouse will be available for use.

    Database connection added.

Connection Options

The clickhouse-driver supports different connection options. For a full list, see the clickhouse-driver Connection site. The following are recommended settings:

  • secure: Required if the connection uses encryption.
  • port: Used if the connection uses a non-standard port.
  • alt_hosts: Specifies alternate connection host names.
  • verify: Default to true. If true, then the certificate used for encryption must be validated. Set to false if using a self-signed certificate. For more information on ClickHouse certificates, see the Network Hardening section in the ClickHouse Operations Guide.

Connection FAQ

What ports are used for the Superset to ClickHouse Connection?

The Superset connection to ClickHouse uses the TCP connection over the following ports:

  • Encrypted: 9440
  • Unencrypted: 9000

If using the encrypted connection, then the option secure=yes is required.

3 - Create Charts from ClickHouse Data

Build stunning dashboards populated by ClickHouse data.

Once Superset has been installed and connected to ClickHouse, new charts can be created that draw their visualizations from the ClickHouse data.

For complete details on Superset dashboards and charts, see the Superset Creating Your First Dashboard site.

The following procedures demonstrate two methods of adding charts connected to ClickHouse data:

  1. From a physical dataset tied to a specific database table.
  2. From a virtual dataset tied to a query saved in SQL Lab.

Create a Chart from a Physical Dataset

Physical Dataset Chart Prerequisites

Before adding a new chart in Superset from ClickHouse data, Superset must be connected to ClickHouse. See Connect Superset to ClickHouse for more information.

Add a Physical Dataset Steps

To add a new Physical Dataset:

  1. From the top menu, select Data->Datasets.

    Select Datasets
  2. Select + Dataset.

    Add Dataset
  3. Set the following:

    Add Dataset
    1. Datasource: The connected ClickHouse database server to use.
    2. Schema: The database to use on the ClickHouse server.
    3. Table: The table to be queried.

Once complete, the physical dataset will be available for building new charts.

Chart from Standard Dataset Steps

To create a chart from a physical dataset:

  1. Select Charts from the top menu.

    Select Charts
  2. Select + Chart.

    Select + Chart
  3. Set the following:

    1. Dataset: Select the dataset to be used. In this case, a dataset tied to a specific table.
    2. Visualization Type: Select the chart type, such as line chart, time series, etc.
  4. When ready, select Create New Chart.

    New chart settings
  5. Depending on the chart created, the following options may be set to improve the chart’s display:

    1. Group By: Select a column to group data by. This is highly useful when dealing with multiple rows of the same unique source.
    2. Time Grain: Set the interval of time to measure when dealing with time series or similar charts.
    3. Time Range: Select the period of time of interest for the chart to cover.
  6. After each setting change, select Run to view the updated chart (A).

  7. Set the name by selecting the section above the chart, titled “- untitled” by default (B).

    New chart settings
  8. When finished, select + Save to save the chart.

    New chart
    1. Provide the chart name.
    2. Select a Dashboard to add the chart to, or enter a new Dashboard name and a title.
    3. Select Save when finished, or Save and Go To Dashboard to view the new chart.

Create a Chart from a Virtual Dataset

A Virtual Dataset is created from a query that is saved in the Superset SQL Labs. This allows for charts that are build from specific queries that include joins, specific filters, and other criteria.

Virtual Dataset Chart Prerequisites

Before adding a new chart in Superset from ClickHouse data, Superset must be connected to ClickHouse. See Connect Superset to ClickHouse for more information.

Create a Virtual Dataset Steps

To create a chart from a Virtual Dataset:

  1. From the top menu, select + -> SQL Query.

    Select New SQL query
  2. Enter the following:

    1. Database: Select the ClickHouse database connected to Superset.
    2. Schema: Typically this will be default.
    3. Table: Select a table or table type. If building from just the query, this can be left blank.
  3. In the Query field, enter the ClickHouse SQL query to use. Select Run to verify the results.

  4. Select Save or Save As to save the query into SQL Lab.

  5. To save the query as a Virtual Dataset, select Explore.

    New SQL query settings
  6. Set the name of the Virtual Dataset, then select Save & Explore.

    Save query to SQL Lab

At this point, a Chart can be created from the Virtual Dataset. See the section Create Chart from an Existing Virtual Dataset Steps for information on setting up the chart.

Create Chart from an Existing Virtual Dataset Steps

To create a chart from an existing Virtual Dataset:

  1. Either create a new chart by either:

    1. Selecting Charts -> +Charts, then selecting the Physical Dataset tied to the Virtual Dataset, and Visualization Type to use.
    2. Or from creating a Virtual Dataset as detailed in the process Create a Virtual Dataset Steps.
  2. To change the Virtual Dataset to use, select the in the upper right under the Dataset heading, then select Change Dataset.

    Change dataset
    1. Select the Virtual Dataset to use, then click Proceed to acknowledge that changing the chart’s dataset may break the chart.
  3. Set the following required options:

    Virtual dataset chart settings
    1. Visualization Type: How the data will be displayed in the chart (A).
    2. Name: This is above the chart. Select it once, then set the name (B).
  4. Depending on the Visualization Type selected, complete the other options.

  5. When finished, select + Save to save the chart.

    1. Provide the chart name.
    2. Select a Dashboard to add the chart to, or enter a new Dashboard name and a title.
    3. Select Save when finished, or Save and Go To Dashboard to view the new chart.

Adding Charts FAQ

When I Add a Deck.gl Chart, I Get the Message ‘No Token Warning’. How do I fix it?

The NO_TOKEN_WARNING when using charts such as deck.gl Arc occurs when there is no API token to a map service for generating the map used with the geographic data. In this example, the geographic information provides flight information, but there is no token for the mapbox service to retrieve the map data.

No token warning map error

A token can be retrieved from mapbox after completing their registration process. After obtaining the token, place it in the file superset_config.py with the variable MAPBOX_API_KEY. For example:

MAPBOX_API_KEY = 'mytoken'

For Docker based installations, this is in the directory docker/pythonpath_dev. Once added, restart Superset:

No token warning error fixed