Altinity.Cloud Demo Environments

Demo clusters for testing ClickHouse® connections and playing with publicly available data.

Altinity provides a number of sample databases and ClickHouse® clusters for experimenting with various ClickHouse features.

The Antalya Playground on Altinity.Cloud

The Antalya Playground lets you use hybrid tables that let you query ClickHouse MergeTree and shared Iceberg tables to deliver cheap, scalable real-time analytics.

When you log in to the system with the userid/password combination of demo/demo, you’ll see this panel:

The initial panel of the Antalya Playground

Figure 1 - The initial panel of the Antalya Playground

There are several things to notice in Figure 1 above:

  • If you click the Altinity Logo in the upper left corner, it will display the databases and tables in the system once you’re logged in.
  • The address bar lets you know the URL of the playground server and the version of Altinity Antalya it is running.
  • The user demo is logged in.
  • The main attraction, of course, is the text box where you can execute SQL queries.
  • The Run button beneath the text box executes your query. As noted on the screen, you can also type Ctrl/Cmd+Enter to run the command without clicking on the button.
  • The icon on the right side beneath the text box toggles the display between light mode and dark mode.

Here are the results of a sample query:

Query results in the Antalya Playground

Figure 2 - A query result in the Antalya Playground

This query looks at all taxi rides where the rider was picked up and dropped off in the same borough, sorted by the percentage of all rides that started in that region. The usual query performance statistics appear between the text box and the results area. For numeric results, the yellow shading in the background provides visual cues to the size of the value.

In Figure 2 we have also clicked the Altinity Logo in the upper left to display all the databases in the system. The query above was in the tripdata.taxi_trips table; hovering over a table’s name shows the size of the table, the number of rows, and its engine.

You can download the results in a variety of data formats by clicking on the download arrow button. Select a data format from the dropdown list and click Download:

Download results option

Figure 3 - Downloading query results

Sample queries on the different tables are in the Datasets section that follows.

Datasets

The Antalya Playground is preloaded with two common databases: Tripdata and OnTime.

In this section we’ll look at a number of interesting queries you can run against the data in the playground. The query results will be displayed in the playground as shown in Figure 2 above. For this page, however, we’ll download the results as a markdown table as shown in Figure 3 and use it to display the results.

Tripdata

The Tripdata dataset contains records of taxi and for-hire vehicle rides in New York City from 2009 to 2025. The original Parquet files are available through the Taxi and Livery Commission's website.

Tables in the Antalya playground include:

  • tripdata.fhv_trips - Trips from for-hire vehicles and rideshare services starting in 2015
  • tripdata.taxi_trips - Trips for yellow and green taxis starting in 2014
  • tripdata.taxi_zones - A table that maps a location_id to a zone name, borough, and subregion. As an example, the location 261 maps to the zone World Trade Center, which is in the manhattan_below_60 subregion of Manhattan. Useful for joins.

Here are some interesting queries against this data:

Trips that started and ended in the same borough

Here’s a query that looks at each of the five boroughs (Bronx, Brooklyn, Manhattan, Queens, and Staten Island) for analytics on trips that started and ended in the same borough.

Here’s the query:

SELECT
  pickup_borough,
  count()                                                    AS total_rides,
  countIf(dropoff_borough = pickup_borough)                  AS same_borough_dropoffs,
  round(
    countIf(dropoff_borough = pickup_borough) * 100.0 / count(),
    1
  )                                                          AS pct_staying_in_borough
FROM tripdata.taxi_trips
WHERE pickup_borough IN ('Manhattan','Queens','Bronx','Brooklyn','Staten Island')
  AND dropoff_borough IS NOT NULL
  AND dropoff_borough != ''
GROUP BY pickup_borough
ORDER BY pct_staying_in_borough DESC;

And the results:

pickup_borough total_rides same_borough_dropoffs pct_staying_in_borough
Manhattan 1596960780 1476064377 92.4
Staten Island 106417 77880 73.2
Bronx 7075552 5033605 71.1
Brooklyn 60771375 41098887 67.6
Queens 117563690 48077040 40.9

Most people in Manhattan stayed in Manhattan; on the other hand, more than half the people picked up in Queens went elsewhere (the LaGuardia and JFK airports are in Queens, so lots of people go from the airport to some other borough).

Changing the query to get rideshare data from the fhv_trips table (just change the line above to FROM tripdata.fhv_trips), we get different results:

pickup_borough total_rides same_borough_dropoffs pct_staying_in_borough
Staten Island 17813284 15660329 87.9
Bronx 154618369 119469960 77.3
Brooklyn 360239134 276326236 76.7
Manhattan 547917981 413570842 75.5
Queens 241775840 165807747 68.6

Far more Uber and Lyft trips stayed in the same borough, suggesting that people use Uber and Lyft for shorter trips. (Checking that is left as an exercise to the reader.)

Taxi trips in July

This query shows how many taxi trips happened in the month of July through the dataset:

SELECT
  toStartOfMonth(pickup_datetime) AS month,
  count()                         AS trips
FROM tripdata.taxi_trips
WHERE toMonth(pickup_datetime) = 7
GROUP BY month
ORDER BY month;

The results show a dramatic drop-off at the peak of the pandemic, as you’d expect:

month trips
2009-07-01 13626103
2010-07-01 14656519
2011-07-01 14742561
2012-07-01 14379307
2013-07-01 13823840
2014-07-01 14378246
2015-07-01 13101337
2016-07-01 11626590
2017-07-01 9503269
2018-07-01 8535341
2019-07-01 6781088
2020-07-01 872692
2021-07-01 2905788
2022-07-01 3238567
2023-07-01 2968439

Let’s try a query that combines the taxi data and the ridesharing data:

SELECT
  month,
  taxi_trips,
  fhv_trips,
  taxi_trips + fhv_trips AS total_trips
FROM (
  SELECT
    toStartOfMonth(pickup_datetime) AS month,
    count()                         AS taxi_trips
  FROM tripdata.taxi_trips
  WHERE toMonth(pickup_datetime) = 7
  GROUP BY month
) t
JOIN (
  SELECT
    toStartOfMonth(pickup_datetime) AS month,
    count()                         AS fhv_trips
  FROM tripdata.fhv_trips
  WHERE toMonth(pickup_datetime) = 7
  GROUP BY month
) f USING (month)
ORDER BY month;

The results are similar, a major dropoff during the pandemic:

month taxi_trips fhv_trips total_trips
2015-07-01 13101337 4384986 17486323
2016-07-01 11626590 10977358 22603948
2017-07-01 9503269 15672674 25175943
2018-07-01 8535341 21599714 30135055
2019-07-01 6781088 22251100 29032188
2020-07-01 872692 11085943 11958635
2021-07-01 2905788 16267188 19172976
2022-07-01 3238567 18624198 21862765
2023-07-01 2968439 20502974 23471413

Looking at the seven days of the week, which is the most popular hour for people to be picked up?

SELECT
  ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'][dow] AS day_name,
  hour,
  trips
FROM (
  SELECT
    toDayOfWeek(pickup_datetime) AS dow,
    toHour(pickup_datetime)      AS hour,
    count()                      AS trips,
    row_number() OVER (
      PARTITION BY toDayOfWeek(pickup_datetime)
      ORDER BY count() DESC
    )                            AS rn
  FROM tripdata.taxi_trips
  GROUP BY dow, hour
)
WHERE rn = 1
ORDER BY dow;

Here are the results:

day_name hour trips
Monday 18 15849481
Tuesday 18 16925811
Wednesday 19 17347673
Thursday 19 17617635
Friday 19 17890364
Saturday 19 16587343
Sunday 0 15245544

People are getting into taxis at 6:00 or 7:00 pm every day except Sunday, when the highest traffic seems to be Saturday night revelers hailing a cab after midnight.

For ridesharing services (use FROM taxidata.fhv_trips in the query above), the data is a little different:

day_name hour trips
Monday 8 13742138
Tuesday 8 14652655
Wednesday 18 14990988
Thursday 18 15685006
Friday 19 17442085
Saturday 19 17574739
Sunday 0 15792141

Uber and Lyft are more popular on Monday and Tuesday mornings, but as with regular taxis, the night hawks are using ridesharing services after midnight Saturday.

The impact of ridesharing services

Uber launched in NYC in 2011, and Lyft launched in 2014. Here’s a look at the percentage of rides on ridesharing services versus traditional taxis since 2015:

WITH taxi AS (
  SELECT
    toYear(pickup_datetime)        AS year,
    toStartOfMonth(pickup_datetime) AS month,
    count()                         AS trips
  FROM tripdata.taxi_trips
  WHERE pickup_datetime IS NOT NULL
  GROUP BY year, month
),
fhv AS (
  SELECT
    toYear(pickup_datetime)        AS year,
    toStartOfMonth(pickup_datetime) AS month,
    count()                         AS trips
  FROM tripdata.fhv_trips
  WHERE pickup_datetime IS NOT NULL
  GROUP BY year, month
),
taxi_annual AS (
  SELECT
    year,
    round(avg(trips), 0)  AS avg_monthly_taxi_trips,
    sum(trips)            AS total_taxi_trips
  FROM taxi
  GROUP BY year
),
fhv_annual AS (
  SELECT
    year,
    round(avg(trips), 0)  AS avg_monthly_fhv_trips,
    sum(trips)            AS total_fhv_trips
  FROM fhv
  GROUP BY year
)
SELECT
  t.year,
  t.avg_monthly_taxi_trips,
  f.avg_monthly_fhv_trips,
  t.total_taxi_trips + f.total_fhv_trips        AS total_trips,
  round(f.total_fhv_trips * 100.0 /
    (t.total_taxi_trips + f.total_fhv_trips), 2) AS pct_fhv
FROM taxi_annual t
JOIN fhv_annual f USING (year)
ORDER BY t.year;

The results are dramatic: In 2015, as ridesharing services were starting to take hold, just over a quarter of rides were with Uber and Lyft. That percentage grew steadily; since 2020, Uber and Lyft have had roughly 85% of the market.

year avg_monthly_taxi_trips avg_monthly_fhv_trips total_trips pct_fhv
2015 13772750 5282378 228661529 27.72
2016 12293112 11009507 279631429 47.25
2017 10436441 16025796 317546849 60.56
2018 9314153 21739563 372644592 70.01
2019 7574844 23157628 368789663 75.35
2020 2198619 13187945 184638768 85.71
2021 2664393 15783493 221374629 85.56
2022 3374668 18910646 267423763 84.86
2023 3258099 20695722 287445852 86.4
2024 1510582 20954046 23975210 87.4

OnTime

The OnTime dataset has records of airline on-time performance of domestic US flights from 1987 to 2025. It is provided by the Bureau of Transportation Statistics.

Tables in the Antalya playground include:

  • ontime.fact_ontime and ontime.fact_online_replicated - The main flight data, including the date and time, carrier, origin airport, destination airport, and how many minutes late (if any) a flight arrived. The engines of the two databases are MergeTree and ReplicatedMergeTree, as you would expect, but the data is the same.
  • ontime.dim_airports and ontime.dim_airports_bts_full - Complete airport metadata, such as airport name, country code, latitude, and longitude. bts_full is an unabridged dataset of all airports, including airports that are now closed.

Here are some interesting queries against this data.

Least punctual airlines

Here’s a query to find the 15 airlines with the worst on-time performance since 2018:

WITH carrier_names AS (
  SELECT
    Carrier,
    multiIf(
      Carrier = 'AA', 'American Airlines',
      Carrier = 'AS', 'Alaska Airlines',
      Carrier = 'B6', 'JetBlue Airways',
      Carrier = 'CO', 'Continental Airlines',
      Carrier = 'DL', 'Delta Air Lines',
      Carrier = 'EV', 'ExpressJet Airlines',
      Carrier = 'F9', 'Frontier Airlines',
      Carrier = 'FL', 'AirTran Airways',
      Carrier = 'G4', 'Allegiant Air',
      Carrier = 'HA', 'Hawaiian Airlines',
      Carrier = 'HP', 'America West Airlines',
      Carrier = 'MQ', 'Envoy Air (American Eagle)',
      Carrier = 'NK', 'Spirit Airlines',
      Carrier = 'NW', 'Northwest Airlines',
      Carrier = 'OH', 'PSA Airlines',
      Carrier = 'OO', 'SkyWest Airlines',
      Carrier = 'QX', 'Horizon Air',
      Carrier = 'TW', 'Trans World Airlines',
      Carrier = 'UA', 'United Airlines',
      Carrier = 'US', 'US Airways',
      Carrier = 'VX', 'Virgin America',
      Carrier = 'WN', 'Southwest Airlines',
      Carrier = 'YV', 'Mesa Airlines',
      Carrier = 'YX', 'Republic Airways',
      Carrier
    ) AS carrier_name
  FROM ontime.fact_ontime
  GROUP BY Carrier
)
SELECT
  cn.carrier_name,
  count()                          AS flights,
  round(avg(ArrDelayMinutes), 2)   AS avg_arr_delay_mins,
  concat(toDecimalString(
    round(countIf(ArrDel15 = 1) * 100.0 / count(), 2),
  2), '%')                         AS pct_late
FROM ontime.fact_ontime f
JOIN carrier_names cn USING (Carrier)
WHERE Cancelled = 0
  AND Year >= 2018
GROUP BY cn.carrier_name
ORDER BY avg_arr_delay_mins DESC
LIMIT 15;

Our multiIf clause replaces the airline carrier code (B6) with the name of the airline (JetBlue Airlines). It makes the query longer, but the results are much more legible.

Here are our results, sorted by how many minutes late an airline’s flights are on average:

carrier_name flights avg_arr_delay_mins pct_late
JetBlue Airways 1923885 21.46 26.68%
Frontier Airlines 1195804 21.4 27.03%
Allegiant Air 869378 19.92 24.65%
ExpressJet Airlines 373917 18.47 21.35%
American Airlines 6787795 17.61 21.54%
Mesa Airlines 824697 16.46 19.27%
Spirit Airlines 1626039 16.28 21.87%
PSA Airlines 1808876 15.8 19.60%
SkyWest Airlines 5845849 14.99 17.26%
United Airlines 4836788 14.82 19.41%
Envoy Air (American Eagle) 2087284 12.65 18.68%
Republic Airways 2383285 11.99 16.44%
Virgin America 17237 11.95 19.96%
9E 1579112 11.5 14.02%
Delta Air Lines 7101668 11.48 15.22%

As you can see, JetBlue, Frontier, and Allegiant are all late roughly a quarter of the time, with those flights averaging 20+ minutes late. (Carrier code 9E is Endeavor Air, btw.)

Late arriving aircraft cascade effect

This query takes a look at how late aircraft impact flights through the day. A given airplane will likely make several flights a day, particularly on regional routes. If any of those flights is late, it affects every subsequent flight that plane is scheduled to fly. Here’s the query:

SELECT
  intDiv(CRSDepTime, 100)                        AS scheduled_hour,
  round(avg(DepDelayMinutes), 2)                  AS avg_dep_delay,
  round(countIf(LateAircraftDelay > 0) * 100.0 /
    count(), 2)                                   AS pct_flights_with_late_aircraft,
  count()                                         AS flights
FROM ontime.fact_ontime
WHERE Cancelled = 0
  AND DepDelayMinutes > 0
  AND CRSDepTime IS NOT NULL
  AND intDiv(CRSDepTime, 100) BETWEEN 5 AND 23
GROUP BY scheduled_hour
ORDER BY scheduled_hour;

As you would expect, the average flight delay and the percentage of flights with late aircraft increase through the day, as one delay likely leads to another:

scheduled_hour avg_dep_delay pct_flights_with_late_aircraft flights
5 34.21 1.59 466587
6 24 1.3 3040805
7 22.94 2.95 3884887
8 22.22 5.88 4765492
9 22.7 9.4 4687160
10 23.9 13.08 4904421
11 24.47 14.22 5412825
12 25.51 15.06 5633853
13 26.5 15.87 5992125
14 28.22 17.76 5767062
15 28.79 17.84 6266391
16 30.34 19.54 6292767
17 32.27 20.06 7197384
18 33.68 21.63 6597901
19 35.29 23.19 5936959
20 34.65 21.49 4980095
21 33.33 21.11 3307529
22 33.24 20.04 1704358
23 28.74 14.31 644409

The late aircraft effect peaks around 7:00 pm, then declines. The number of flights per hour are at their highest from 3:00 pm to 6:00 pm; as the number of flights goes down, the impact of late aircraft starts to subside.

Types of delays

The fact_ontime table defines several different types of delays:

  • Carrier delays - Anything caused by the airline, including maintenance, cleaning, or a late-arriving crew.
  • Weather - Self-explanatory
  • NAS (National Air System) - Anything controlled by the FAA, including ground stops and runway changes
  • Security - Screening issues from the TSA, very rare
  • Late aircraft - The incoming aircraft arrived late

Here’s a query that finds the various kinds of delays by the 15 airlines with the highest average carrier delay, starting with our multiIf statement:

SELECT
  multiIf(
    Carrier = 'AA', 'American Airlines',
    Carrier = 'AS', 'Alaska Airlines',
    Carrier = 'B6', 'JetBlue Airways',
    Carrier = 'CO', 'Continental Airlines',
    Carrier = 'DH', 'Atlantic Coast Airlines / Independence Air',
    Carrier = 'DL', 'Delta Air Lines',
    Carrier = 'EV', 'ExpressJet Airlines',
    Carrier = 'F9', 'Frontier Airlines',
    Carrier = 'FL', 'AirTran Airways',
    Carrier = 'G4', 'Allegiant Air',
    Carrier = 'HA', 'Hawaiian Airlines',
    Carrier = 'HP', 'America West Airlines',
    Carrier = 'MQ', 'Envoy Air (American Eagle)',
    Carrier = 'NK', 'Spirit Airlines',
    Carrier = 'NW', 'Northwest Airlines',
    Carrier = 'OH', 'PSA Airlines',
    Carrier = 'OO', 'SkyWest Airlines',
    Carrier = 'QX', 'Horizon Air',
    Carrier = 'TW', 'Trans World Airlines',
    Carrier = 'UA', 'United Airlines',
    Carrier = 'US', 'US Airways',
    Carrier = 'VX', 'Virgin America',
    Carrier = 'WN', 'Southwest Airlines',
    Carrier = 'XE', 'ExpressJet Airlines',
    Carrier = 'YV', 'Mesa Airlines',
    Carrier = 'YX', 'Republic Airways',
    Carrier
  )                                  AS carrier_name,
  round(avg(CarrierDelay), 2)        AS avg_carrier_delay,
  round(avg(WeatherDelay), 2)        AS avg_weather_delay,
  round(avg(NASDelay), 2)            AS avg_nas_delay,
  round(avg(SecurityDelay), 2)       AS avg_security_delay,
  round(avg(LateAircraftDelay), 2)   AS avg_late_aircraft_delay
FROM ontime.fact_ontime
WHERE Cancelled = 0
  AND ArrDelayMinutes > 0
  AND Year >= 2018
GROUP BY carrier_name
ORDER BY avg_carrier_delay DESC
LIMIT 15;

Here are the results:

carrier_name avg_carrier_delay avg_weather_delay avg_nas_delay avg_security_delay avg_late_aircraft_delay
SkyWest Airlines 39.03 10.26 10.74 0.12 21.28
Delta Air Lines 31.09 3.92 14.59 0.08 19.77
Mesa Airlines 29.48 7.13 11.74 0.1 31.77
JetBlue Airways 29.08 2.07 15.68 0.16 29.67
Hawaiian Airlines 27.28 1.55 1.28 0.24 16.01
American Airlines 26.91 4.02 12.5 0.15 33.23
ExpressJet Airlines 26.63 3.25 27.56 0 24.64
Allegiant Air 24.83 6.45 13.79 0.26 30.88
9E 24.02 5.89 19 0.03 28.27
Frontier Airlines 22.01 1.7 15.05 0 36.65
PSA Airlines 21.7 5.43 11.35 0.14 36.69
United Airlines 20.26 3.85 18.99 0.01 28.16
Horizon Air 18.63 2.8 8.79 0.16 20.32
Spirit Airlines 18.34 2.55 28.81 0.45 19.73
Republic Airways 17.43 3.98 22.27 0.06 24.02

That data is somewhat misleading, however; we’re showing the average carrier delay. Delta Air Lines has an average carrier delay of more than half an hour, which doesn’t sound good. But how often does a carrier delay happen? If an airline’s carrier delay is long, but that airline hardly ever has a carrier delay, that’s good to know. On the other hand, if an airline has a short average carrier delay but carrier delays happen all the time, that’s also useful information.

Here’s a query that looks at carrier delays, calculating an expected carrier delay based on the average carrier delay and how often a carrier delay happens:

SELECT
  carrier_name,
  round(avg(CarrierDelay), 1)                                AS avg_carrier_delay,
  round(countIf(CarrierDelay > 0) * 100.0 / count(), 1)     AS pct_carrier_delay,
  round(
    (countIf(CarrierDelay > 0) * 1.0 / count())
    * avg(CarrierDelay)
  , 2)                                                       AS expected_carrier_delay_per_flight
FROM (
  SELECT
    multiIf(
      Carrier = 'AA', 'American Airlines',
      Carrier = 'AS', 'Alaska Airlines',
      Carrier = 'B6', 'JetBlue Airways',
      Carrier = 'CO', 'Continental Airlines',
      Carrier = 'DH', 'Atlantic Coast Airlines / Independence Air',
      Carrier = 'DL', 'Delta Air Lines',
      Carrier = 'EV', 'ExpressJet Airlines',
      Carrier = 'F9', 'Frontier Airlines',
      Carrier = 'FL', 'AirTran Airways',
      Carrier = 'G4', 'Allegiant Air',
      Carrier = 'HA', 'Hawaiian Airlines',
      Carrier = 'HP', 'America West Airlines',
      Carrier = 'MQ', 'Envoy Air (American Eagle)',
      Carrier = 'NK', 'Spirit Airlines',
      Carrier = 'NW', 'Northwest Airlines',
      Carrier = 'OH', 'PSA Airlines',
      Carrier = 'OO', 'SkyWest Airlines',
      Carrier = 'QX', 'Horizon Air',
      Carrier = 'TW', 'Trans World Airlines',
      Carrier = 'UA', 'United Airlines',
      Carrier = 'US', 'US Airways',
      Carrier = 'VX', 'Virgin America',
      Carrier = 'WN', 'Southwest Airlines',
      Carrier = 'XE', 'ExpressJet Airlines',
      Carrier = 'YV', 'Mesa Airlines',
      Carrier = 'YX', 'Republic Airways',
      Carrier
    ) AS carrier_name,
    CarrierDelay
  FROM ontime.fact_ontime
  WHERE Cancelled = 0
)
GROUP BY carrier_name
ORDER BY expected_carrier_delay_per_flight DESC
LIMIT 15;

Here are the results:

carrier_name avg_carrier_delay pct_carrier_delay expected_carrier_delay_per_flight
JetBlue Airways 22.2 15.6 3.47
Allegiant Air 24.8 11.1 2.76
Hawaiian Airlines 28.3 9.1 2.57
Mesa Airlines 27.4 9 2.45
Frontier Airlines 18.1 12.9 2.34
ExpressJet Airlines 20.6 10.5 2.16
PSA Airlines 21.5 10 2.15
SkyWest Airlines 26.6 7.9 2.1
Spirit Airlines 17.1 10.2 1.75
American Airlines 21.7 6.4 1.39
9E 21.9 6.2 1.36
Horizon Air 18.6 6.8 1.27
Southwest Airlines 13.8 8.5 1.18
Envoy Air (American Eagle) 15 7.7 1.15
Republic Airways 17.4 6.3 1.1

With this query, Delta Air Lines and United Airlines are no longer in the results, with Southwest Airlines and Envoy air now included.

The Altinity Stable® Playground on Altinity.Cloud

The Altinity Stable Playground lets you run SQL queries against several sample databases. Altinity Stable Builds are 100% compatible with ClickHouse with robust upgrade and 3 years of support.

When you log in to the system with the userid/password combination of demo/demo, you’ll see this panel:

The initial panel of the Altinity Stable Playground

Figure 4 - The initial panel of the Altinity Stable Playground

There are several things to notice in Figure 4 above:

  • If you click the Altinity Logo in the upper left corner, it will display the databases and tables in the system once you’re logged in.
  • The address bar lets you know the URL of the playground server and the version of Altinity Antalya it is running.
  • The user demo is logged in.
  • The main attraction, of course, is the text box where you can execute SQL queries.
  • The Run button beneath the text box executes your query. As noted on the screen, you can also type Ctrl/Cmd+Enter to run the command without clicking on the button.
  • The icon on the right side beneath the text box toggles the display between light mode and dark mode.

Here are the results of a sample query:

Query results in the Altinity Stable Playground

Figure 5 - A query result in the Altinity Stable Playground

This query looks at all taxi rides where the rider was picked up and dropped off in the same borough, sorted by the percentage of all rides that started in that region. Most people in Manhattan stayed in Manhattan; on the other hand, more than half the people picked up in Queens went elsewhere (the LaGuardia and JFK airports are in Queens). The usual query performance statistics appear between the text box and the results area. For numeric results, the yellow shading in the background provides visual cues to the size of the value.

In Figure 5 we have also clicked the Altinity Logo in the upper left to display all the databases in the system. The query above was in the tripdata.taxi_trips table; hovering over the table’s name shows the size of the table, the number of rows, and its engine.

If you’d like to run this query yourself, here it is:

SELECT
  pickup_borough,
  count()                                                    AS total_rides,
  countIf(dropoff_borough = pickup_borough)                  AS same_borough_dropoffs,
  round(
    countIf(dropoff_borough = pickup_borough) * 100.0 / count(),
    1
  )                                                          AS pct_staying_in_borough
FROM tripdata.taxi_trips
WHERE pickup_borough IN ('Manhattan','Queens','Bronx','Brooklyn','Staten Island')
  AND dropoff_borough IS NOT NULL
  AND dropoff_borough != ''
GROUP BY pickup_borough
ORDER BY pct_staying_in_borough DESC;

You can run queries such as DESCRIBE TABLE tripdata.taxi_trips for information about the structure of the databases.

Datasets

The Altinity Stable Playground is preloaded with two common databases.

OnTime

The OnTime dataset has records of airline on-time performance of domestic US flights from 1987 to 2025.

Tables in the public playground include:

  • ontime.fact_ontime (main flight data)
  • ontime.dim_airports (dimension table for active airports)
  • ontime.dim_airports_bts_full (dimension table for all historical airports)

Tripdata

The Tripdata dataset contains records of taxi and for-hire vehicle rides in New York City from 2009 to 2025. Tables in the public playground include:

  • tripdata.fhv_trips (trips from for-hire vehicles/rideshare starting in 2015)
  • tripdata.taxi_trips (trips for yellow and green cabs)
  • tripdata.taxi_zones (dimension table for taxi zone codes)

The Altinity.Cloud demo environment

Altinity has created several guides in the Integrations section of the Altinity Documentation site as well as various blog posts posted in the Altinity Resources Page. To accompany these guides, Altinity has created a public ClickHouse® database hosted through Altinity.Cloud. Our goal is to help new users experience ClickHouse and to help experienced ClickHouse administrators test the connections between applications and ClickHouse.

Database Details

The Altinity.Cloud Demo database has the following connection information:

  • URL: github.demo.altinity.cloud
  • HTTP Protocol: HTTPS
  • HTTPS Port: 8443
  • ClickHouse Client Port: 9440
  • Database: default
  • Username: demo
  • Password: demo

Altinity.Cloud Demo Database Connection Examples

The following examples provide a quick demonstration on how to connect specific interfaces and applications to the Altinity.Cloud Demo ClickHouse cluster.

ClickHouse Play

ClickHouse lets you run queries directly using a simple HTML interface:

https://github.demo.altinity.cloud:8443/play

ClickHouse Client

To connect to the Altinity.Cloud Demo database with the native ClickHouse client, use the following command:

clickhouse-client --host github.demo.altinity.cloud --user=demo --password=demo --port=9440 --secure

DBeaver and JDBC Connectors

For applications like DBeaver that use JDBC drivers to connect to database, use the following settings:

  1. From the main menu, select Database->New Database Connection.
  2. From the Select your database screen, select ClickHouse.
  3. On the Main tab, set the following options:
    • JDBC URL: jdbc:clickhouse://github.demo.altinity.cloud:8443/default (DBeaver populates this by default based on the other settings).
    • Host: github.demo.altinity.cloud
    • Port: 8443
    • Database/Schema: default
    • Username: demo
    • Password: demo
  4. On the Driver Properties tab, set ssl to true.
  5. Click Finish.

Grafana

For Grafana, see the Integrating Grafana with ClickHouse guide.

Available Test Data

The demo user is allowed to access the default database and table functions. You can run SHOW TABLES to list available tables. The main public datasets that are often used in examples are:

ontime

The ontime table is provided by the Bureau of Transportation Statistics. It is collected from Airline On-Time Performance Data. This information is based on flight data collected by US certified air carriers regarding departure and arrival times, flights cancelled, flights with delayed departure or arrival times, and other relevant data.

airports

The table airports is information on airports, including the name of the airport, city, latitude and longitude, and other relevant information. For the Altinity.Cloud Demo database, this is often joined with the ontime table. The information in this table is provided by OpenFlights.org, from their Airport database.

tripdata

The tripdata table is based on the New York City Taxi and Limousine Commission (TLC) Trip Record Data for Yellow Cab rides. This table contains information about rides starting and ending in New York City, including the date and time of pickups and dropoffs, the total amount of the fare, and other data. This was used as part of the article Tableau Tricks with ClickHouse.

github_events

The table github_events is a collection of all Github events between 2011 to 2020, comprising 3.1 billion records and 200 GB of data. This is provided from the GH Archive.