Apache Superset is a powerful and convenient way to create dashboards to display information for your organization. This presents information that allows for quick visual understand of trends, where events are happening, and keeps your team on top of the massive amounts of data coming into your organization.
When combined with ClickHouse and its capacity to analyze terabytes of information in milliseconds, the integration of Superset and ClickHouse can reduce the sea of seemingly disconnected data flowing through into an understandable whole.
Integrating Superset with ClickHouse is broken down into three major steps:
Install Superset.
Connect Superset to ClickHouse.
Connect Superset dashboard elements to ClickHouse queries.
1 - Install Superset
Superset installation instructions
Three methods of installing Superset are provided here:
Direct installation: Directly download the Superset Python modules into an operating system.
Docker based installation: Use Docker containers to provide Superset, whether as stand alone containers or part of the Kubernetes environment.
Managed installation: Use managed services to provide Superset, and connect it to your ClickHouse.
Direct Installation
The following instructions are based an Ubuntu 20.04 environment. For information on how to install Superset for other operating systems, see Installing Superset from Scratch.
Direct Installation Prerequisites
Before installing Superset on Ubuntu 20.04, the following packages must be installed, and will require administrative access on the machine.
Note that Superset requires Python, specifically Python 3.7.9 and above as of this document. These instructions are modified from the original to include installing python3-venv as part of the other prerequisites.
To install Superset directly into the Ubuntu 20.04 operating system:
A virtual environment is highly recommended to keep all of the Superset python requirements contained. Use the following commands to create your Superset virtual environment, activate it, then upgrade pip:
Install Superset with the following in your virtual environment. The command superset fab create-admin sets the admin user password, which will be used to login and update administrative level Superset settings:
exportFLASK_APP=superset
pip install apache-superset
superset db upgrade
superset fab create-admin
superset load_examples
superset init
Install the ClickHouse SQLAlchemy - this allows Superset to communicate with ClickHouse. There are two drivers currently in development. As of this time, we recommend the clickhouse-sqlalchemy driver. To install clickhouse-sqlalchemy, use the pip command:
pip install clickhouse-sqlalchemy
For those who want to enable TLS communications with services like Altinity.Cloud, verify the versions:
The clickhouse-driver version should be greater than 0.2.0.
The clickhouse-sqlalchemy driver should be 0.1.6 or greater.
Once Superset is installed, start it with the following command:
superset run -p 8088 --with-threads --reload --debugger
Access the Superset web interface with a browser at the host name, port 8088. For example: http://localhost:8088. The default administrative account will be admin, with the password set from the command superset fab create-admin as listed above.
For organizations that prefer Docker based installations, or want to add Superset to a Kubernetes environment along with ClickHouse on Kubernetes, Superset can be installed with a few Docker commands.
The following instructions use the existing Apache Superset docker-compose file from the Github repository.
Docker Prerequisites
docker 19 and above, with either docker-compose version 1.29 and above, or the built in docker compose command that’s part of Docker 20.10 and above.
Docker Based Installation Instructions
To install Superset with a Docker Container:
Download the Superset configuration files from the GitHub repository:
git clone https://github.com/apache/superset
Enter the superset directory and set the clickhouse-driver and the clickhouse-sqlalchemy version requirements:
cd superset
touch ./docker/requirements-local.txt
echo"clickhouse-driver>=0.2.0" >> ./docker/requirements-local.txt
echo"clickhouse-sqlalchemy>=0.1.6" >> ./docker/requirements-local.txt
Run either docker-compose or docker compose to download the required images and start the Superset Docker containers:
docker-compose -f docker-compose-non-dev.yml up
Starting Superset From Docker
Installing and starting Superset from Docker is the same docker-compose or docker compose installation command:
docker-compose -f docker-compose-non-dev.yml up
Access the Superset web interface with a browser at the server’s host name or IP address on port 8088. For example: http://localhost:8088. The default administrative account will be admin, with the password admin.
Preset Cloud
For organizations that prefer managed services, Preset Cloud offers Superset with with clickhouse-sqlalchemy. This can then be connected to an Altinity.Cloud account.
Production Installation Tips
The examples provided above are useful for development and testing environments. For full production environments installation and deployments, refer to the Superset Documentation site.
Some recommendations for organizations that want to install Superset in a production environment:
Replace the default SQLite database used to store the Superset settings with something more robust like PostgreSQL or MySQL. For an example of the process, see the article Migrating Superset to Postgres.
After Superset has been setup and the ClickHouse SQLAlchemy drivers installed, the two systems can be connected together. The following details how to connect Superset to an existing ClickHouse database.
The SQLAlchemy format uses the following elements:
SQLAlchemy Connection Type: Sets the driver to use to connect.
username: The database account to use for authentication.
password: The password for the database account being used.
url: The full hostname to the database. This does not include tags such as HTTP or HTTPS - just the host name.
options: Any additional options. For example, if the database connection is encrypted, then the option secure=true will be required. For more information on setting up ClickHouse network encryption, see Network Hardening.
Select Test Connection to verify the connection.
If the connection works, then click Add.
Once finished, the new database connection to ClickHouse will be available for use.
Connection Options
The clickhouse-driver supports different connection options. For a full list, see the clickhouse-driver Connection site. The following are recommended settings:
secure: Required if the connection uses encryption.
port: Used if the connection uses a non-standard port.
verify: Default to true. If true, then the certificate used for encryption must be validated. Set to false if using a self-signed certificate. For more information on ClickHouse certificates, see Network Hardening.
Connection FAQ
What ports are used for the Superset to ClickHouse Connection?
The Superset connection to ClickHouse uses the TCP connection over the following ports:
Encrypted: 9440
Unencrypted: 9000
If using the encrypted connection, then the option secure=yes is required.
3 - Create Charts from ClickHouse Data
Build stunning dashboards populated by ClickHouse data.
Once Superset has been installed and connected to ClickHouse, new charts can be created that draw their visualizations from the ClickHouse data.
The following procedures demonstrate two methods of adding charts connected to ClickHouse data:
From a physical dataset tied to a specific database table.
From a virtual dataset tied to a query saved in SQL Lab.
Create a Chart from a Physical Dataset
Physical Dataset Chart Prerequisites
Before adding a new chart in Superset from ClickHouse data, Superset must be connected to ClickHouse. See Connect Superset to ClickHouse for more information.
Add a Physical Dataset Steps
To add a new Physical Dataset:
From the top menu, select Data->Datasets.
Select + Dataset.
Set the following:
Datasource: The connected ClickHouse database server to use.
Schema: The database to use on the ClickHouse server.
Table: The table to be queried.
Once complete, the physical dataset will be available for building new charts.
Chart from Standard Dataset Steps
To create a chart from a physical dataset:
Select Charts from the top menu.
Select + Chart.
Set the following:
Dataset: Select the dataset to be used. In this case, a dataset tied to a specific table.
Visualization Type: Select the chart type, such as line chart, time series, etc.
When ready, select Create New Chart.
Depending on the chart created, the following options may be set to improve the chart’s display:
Group By: Select a column to group data by. This is highly useful when dealing with multiple rows of the same unique source.
Time Grain: Set the interval of time to measure when dealing with time series or similar charts.
Time Range: Select the period of time of interest for the chart to cover.
After each setting change, select Run to view the updated chart (A).
Set the name by selecting the section above the chart, titled “- untitled” by default (B).
When finished, select + Save to save the chart.
Provide the chart name.
Select a Dashboard to add the chart to, or enter a new Dashboard name and a title.
Select Save when finished, or Save and Go To Dashboard to view the new chart.
Create a Chart from a Virtual Dataset
A Virtual Dataset is created from a query that is saved in the Superset SQL Labs. This allows for charts that are build from specific queries that include joins, specific filters, and other criteria.
Virtual Dataset Chart Prerequisites
Before adding a new chart in Superset from ClickHouse data, Superset must be connected to ClickHouse. See Connect Superset to ClickHouse for more information.
Create a Virtual Dataset Steps
To create a chart from a Virtual Dataset:
From the top menu, select + -> SQL Query.
Enter the following:
Database: Select the ClickHouse database connected to Superset.
Schema: Typically this will be default.
Table: Select a table or table type. If building from just the query, this can be left blank.
In the Query field, enter the ClickHouse SQL query to use. Select Run to verify the results.
Select Save or Save As to save the query into SQL Lab.
To save the query as a Virtual Dataset, select Explore.
Set the name of the Virtual Dataset, then select Save & Explore.
To change the Virtual Dataset to use, select the … in the upper right under the Dataset heading, then select Change Dataset.
Select the Virtual Dataset to use, then click Proceed to acknowledge that changing the chart’s dataset may break the chart.
Set the following required options:
Visualization Type: How the data will be displayed in the chart (A).
Name: This is above the chart. Select it once, then set the name (B).
Depending on the Visualization Type selected, complete the other options.
When finished, select + Save to save the chart.
Provide the chart name.
Select a Dashboard to add the chart to, or enter a new Dashboard name and a title.
Select Save when finished, or Save and Go To Dashboard to view the new chart.
Adding Charts FAQ
When I Add a Deck.gl Chart, I Get the Message ‘No Token Warning’. How do I fix it?
The NO_TOKEN_WARNING when using charts such as deck.gl Arc occurs when there is no API token to a map service for generating the map used with the geographic data. In this example, the geographic information provides flight information, but there is no token for the mapbox service to retrieve the map data.
A token can be retrieved from mapbox after completing their registration process. After obtaining the token, place it in the file superset_config.py with the variable MAPBOX_API_KEY. For example:
MAPBOX_API_KEY='mytoken'
For Docker based installations, this is in the directory docker/pythonpath_dev. Once added, restart Superset: