Replacing Edge-Proxy
We’ll need to deploy the Contour (Envoy-based) ingress controller to replace Altinity’s edge-proxy functionality. Edge-proxy dynamically discovers services by watching Service annotations. We’ll replicate this behavior with Contour.
Understanding Edge-Proxy Service Discovery
Edge-proxy watches for Services with these annotations:
edge-proxy.altinity.com/port-mapping- Comma-separatedexternalPort:mode:backendPorttuples:tls-to-tcp- Terminate TLS, forward plain TCP upstream (HTTP interface and native protocol)tls-to-tls- Terminate TLS, open new TLS connection upstream (upstream cert verified)tls-to-tls-insecure- Terminate TLS, open new TLS connection upstream (no cert verification)tls-passthrough- Relay TLS as-is; upstream terminates TLStcp-passthrough- Plain TCP relay, no TLS (Prometheus, Grafana)
edge-proxy.altinity.com/tls-server-name- SNI hostname(s) edge-proxy routes onedge-proxy.altinity.com/whitelist- Comma-separated source CIDRs to allow; all others deniededge-proxy.altinity.com/zone-routed-tls-server-name- Liketls-server-namebut with AZ-local routingedge-proxy.altinity.com/zone- AZ tag on the Service; edge-proxy instances route to matching-zone services
Example ClickHouse Service (both HTTP and native protocol use tls-to-tcp):
annotations:
edge-proxy.altinity.com/port-mapping: "8443:tls-to-tcp:8123,9440:tls-to-tcp:9000"
edge-proxy.altinity.com/tls-server-name: "chi-0-0.internal.env.altinity.cloud"
Example Prometheus/Grafana Service:
annotations:
edge-proxy.altinity.com/port-mapping: "9090:tcp-passthrough:9090"
# No tls-server-name needed for tcp-passthrough
Prerequisites
Custom Domain and DNS Configuration
Your custom domain currently points to Altinity’s DNS, which routes to edge-proxy load balancers:
Current DNS Flow:
clickhouse.yourdomain.com (your Route53)
→ env.altinity.cloud (Altinity DNS)
→ edge-proxy-xyz.elb.amazonaws.com (Altinity LB)
→ edge-proxy (TLS termination / passthrough)
→ Cluster pods
After Disconnection:
clickhouse.yourdomain.com (your Route53)
→ contour-xyz.elb.amazonaws.com (Contour's LB)
→ Contour/Envoy (TLS termination / TCP passthrough)
→ Cluster pods
# Load environment variables
source ~/.clickhouse-disconnect-env
export CUSTOM_DOMAIN="clickhouse.yourdomain.com"
export ROOT_DOMAIN=$(echo $CUSTOM_DOMAIN | awk -F. '{print $(NF-1)"."$NF}')
# Get Route53 hosted zone ID for $CUSTOM_DOMAIN
export HOSTED_ZONE_ID=$(aws route53 list-hosted-zones \\
--query "HostedZones[?Name=='${ROOT_DOMAIN}.'].Id" \\
--output text | cut -d/ -f3)
# Verify current DNS chain
dig $CUSTOM_DOMAIN +short
# Should show: chi-name.env.altinity.cloud → then altinity ELB
echo "export CUSTOM_DOMAIN='$CUSTOM_DOMAIN'" >> ~/.clickhouse-disconnect-env
echo "export ROOT_DOMAIN='$ROOT_DOMAIN'" >> ~/.clickhouse-disconnect-env
echo "export HOSTED_ZONE_ID='$HOSTED_ZONE_ID'" >> ~/.clickhouse-disconnect-env
Strategy
- Find all Services with edge-proxy annotations (ClickHouse, Prometheus, Grafana)
- Install Contour Gateway Provisioner and Gateway API CRDs; create GatewayClass
- Install cert-manager; pre-flight: switch custom domain CNAMEs to edge-proxy’s LB with TTL=60; issue new TLS certificate
- Create Gateway with HTTPS (port 8443) and TLS (port 9440) listeners; create HTTPRoute (HTTP interface) and TLSRoute (native protocol) — both using the same FQDN on different listeners
- Test Contour via its own LB before cutting over traffic
- LB cutover: update CNAMEs from edge-proxy’s LB to Contour’s LB (~60s propagation due to low TTL)
- Verify production endpoints
- Monitor for 48 hours
Step 1: Discover All Services with Edge-Proxy Annotations
# Full summary: all services with edge-proxy annotations, grouped by connection type
kubectl get svc -A -o json | jq -r '
.items[] | select(.metadata.annotations["edge-proxy.altinity.com/port-mapping"] != null) | {
namespace: .metadata.namespace,
name: .metadata.name,
port_mapping: .metadata.annotations["edge-proxy.altinity.com/port-mapping"],
whitelist: (.metadata.annotations["edge-proxy.altinity.com/whitelist"] // ""),
zone_routed: (.metadata.annotations["edge-proxy.altinity.com/zone-routed-tls-server-name"] // ""),
zone: (.metadata.annotations["edge-proxy.altinity.com/zone"] // "")
}'
# ClickHouse services (used in scripts below)
kubectl get svc -n $CH_NAMESPACE \\
-o json | jq -r '.items[] | select(.metadata.annotations["edge-proxy.altinity.com/port-mapping"] != null) | .metadata.name' \\
> /tmp/clickhouse-services.txt
echo "=== ClickHouse services ==="
cat /tmp/clickhouse-services.txt
# Flag features that need extra handling — read these output sections carefully
echo ""
echo "=== tls-passthrough services (need Passthrough listener in Gateway) ==="
kubectl get svc -A -o json | jq -r '
.items[] | select(.metadata.annotations["edge-proxy.altinity.com/port-mapping"] |
select(. != null) | test("tls-passthrough")) |
.metadata.namespace + "/" + .metadata.name + ": " +
.metadata.annotations["edge-proxy.altinity.com/port-mapping"]' \\
| tee /tmp/tls-passthrough-services.txt
[ -s /tmp/tls-passthrough-services.txt ] && echo "ACTION REQUIRED: see 'Handle tls-passthrough' in Step 4" || echo "(none)"
echo ""
echo "=== tls-to-tls / tls-to-tls-insecure services (upstream TLS — extra config needed) ==="
kubectl get svc -A -o json | jq -r '
.items[] | select(.metadata.annotations["edge-proxy.altinity.com/port-mapping"] |
select(. != null) | test("tls-to-tls")) |
.metadata.namespace + "/" + .metadata.name + ": " +
.metadata.annotations["edge-proxy.altinity.com/port-mapping"]' \\
| tee /tmp/tls-to-tls-services.txt
[ -s /tmp/tls-to-tls-services.txt ] && echo "ACTION REQUIRED: see 'Handle Upstream TLS' in Step 4" || echo "(none)"
echo ""
echo "=== Services with IP whitelists ==="
kubectl get svc -A -o json | jq -r '
.items[] | select(.metadata.annotations["edge-proxy.altinity.com/whitelist"] |
select(. != null) | length > 0) |
.metadata.namespace + "/" + .metadata.name + ": " +
.metadata.annotations["edge-proxy.altinity.com/whitelist"]' \\
| tee /tmp/whitelist-services.txt
[ -s /tmp/whitelist-services.txt ] && echo "ACTION REQUIRED: see 'Handle IP Whitelisting' in Step 4" || echo "(none)"
echo ""
echo "=== Zone-routed services (cross-AZ traffic may increase — review topology hints) ==="
kubectl get svc -A -o json | jq -r '
.items[] | select(.metadata.annotations["edge-proxy.altinity.com/zone-routed-tls-server-name"] |
select(. != null) | length > 0) |
.metadata.namespace + "/" + .metadata.name +
" (zone: " + (.metadata.annotations["edge-proxy.altinity.com/zone"] // "unset") + ")"' \\
| tee /tmp/zone-services.txt
[ -s /tmp/zone-services.txt ] && echo "ACTION REQUIRED: see 'Handle Zone-Aware Routing' in Step 4" || echo "(none)"
Step 2: Install Contour
Contour’s Envoy is provisioned by the Gateway Provisioner, which creates its own AWS load balancer ($CONTOUR_LB). This LB is used only for pre-cutover testing in Step 5. Live traffic continues through edge-proxy’s LB until the explicit cutover in Step 6. The Gateway resource (which tells Contour which ports to expose) is created in Step 4, after the TLS certificate is ready.
# Install Gateway API CRDs — experimental channel is required for TLSRoute support
kubectl apply -f <https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.1/experimental-install.yaml>
# Install Contour Gateway Provisioner
kubectl apply -f <https://projectcontour.io/quickstart/contour-gateway-provisioner.yaml>
# Wait for provisioner to be ready
kubectl wait --for=condition=ready pod \\
-l control-plane=contour-gateway-provisioner \\
-n projectcontour \\
--timeout=300s
# Create GatewayClass — tells Contour to manage Gateway resources
kubectl apply -f - <<'EOF'
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: contour
spec:
controllerName: projectcontour.io/gateway-controller
EOF
Step 3: Install cert-manager to Replace crtd
Altinity’s crtd is proprietary and will be removed during the last step. We need to install the open-source cert-manager as a replacement.
Install cert-manager
# Install cert-manager using official manifests
kubectl apply -f <https://github.com/cert-manager/cert-manager/releases/download/v1.14.1/cert-manager.yaml>
# Wait for cert-manager pods to be ready
kubectl wait --for=condition=ready pod \\
-l app.kubernetes.io/instance=cert-manager \\
-n cert-manager \\
--timeout=300s
# Verify installation
kubectl get pods -n cert-manager
# Expected pods:
# cert-manager-<hash>
# cert-manager-cainjector-<hash>
# cert-manager-webhook-<hash>
Understand How crtd Manages Certificates
crtd does not store ACME credentials as Kubernetes Secrets. It authenticates to Altinity’s CA (ca.altinity.cloud) via a short-lived JWT token. Altinity’s CA internally uses ACME (ZeroSSL primary, Let’s Encrypt fallback) with Altinity’s own DNS credentials to solve DNS-01 challenges — so there is nothing for you to migrate.
How the CA issues certificates:
- crtd generates a CSR (containing all your DNS SANs) and POSTs it to
ca.altinity.cloud/signwith a JWTBearertoken - Altinity’s CA validates the JWT (audience = Altinity domain, subject = your environment name)
- The CA then runs an ACME DNS-01 challenge against your domain using Altinity’s Route53/GCP DNS access to prove ownership
- Once the ACME provider (ZeroSSL or Let’s Encrypt) issues the cert, it’s returned to crtd and written into Kubernetes Secrets
Result: Production certs are publicly trusted (chaining to ZeroSSL or Let’s Encrypt). Dev environments (dev.altinity.cloud) use a self-signed local CA instead.
crtd’s certificate storage layout:
| Resource | Kind | Namespace | Content |
|---|---|---|---|
crtd |
Secret | $SYS_NAMESPACE |
ca.token — JWT for Altinity’s CA signing endpoint |
crtd |
ConfigMap | $SYS_NAMESPACE |
crtd.conf — issuer/cert config |
edge-proxy-default-tls-secret |
Secret | $SYS_NAMESPACE |
Live TLS cert+key for edge-proxy |
clickhouse-default-tls-secret |
Secret | ClickHouse namespace | Live TLS cert+key for ClickHouse |
# Verify crtd is running
kubectl get pods -n $SYS_NAMESPACE | grep crtd
# Inspect crtd configuration — shows issuer type and DNS names being managed
kubectl get configmap crtd -n $SYS_NAMESPACE -o jsonpath='{.data.crtd\\.conf}'
# Check the issuer of the live cert (production: ZeroSSL/Let's Encrypt; dev: CN=ca self-signed)
kubectl get secret edge-proxy-default-tls-secret -n $SYS_NAMESPACE \\
-o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
openssl x509 -noout -issuer -dates
# Extract the DNS SANs to replicate in cert-manager
kubectl get secret edge-proxy-default-tls-secret -n $SYS_NAMESPACE \\
-o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
openssl x509 -noout -text | grep -A3 "Subject Alternative Name"
Important: Altinity’s ACME account keys and DNS credentials are proprietary — there is nothing to migrate. You will set up cert-manager with your own ACME account and DNS credentials from scratch. For custom domains, Altinity’s CA was solving the DNS-01 challenge via a delegated CNAME from _acme-challenge.yourcustom.domain to Altinity’s DNS — you will now own that challenge directly.
Configure DNS-01 Provider and Create ClusterIssuer
cert-manager supports many DNS providers for automated DNS-01 challenges. For the full provider list and configuration examples (Route53, Cloudflare, Google Cloud DNS, AzureDNS, and more), see:
Namespace rules for cert-manager resources:
| Resource | Namespace | Notes |
|---|---|---|
ClusterIssuer |
cluster-scoped (none) | Applies across all namespaces |
| DNS credential Secrets | cert-manager |
Required — ClusterIssuer looks for referenced Secrets only in the cert-manager namespace |
Certificate |
$CH_NAMESPACE |
The TLS Secret is created in the same namespace as the Certificate |
| TLS Secret (output) | $CH_NAMESPACE |
Contour reads it from projectcontour, so it gets copied there |
Create any DNS provider credential Secret in the cert-manager namespace before applying the ClusterIssuer:
# Example — replace with your provider's secret structure (see cert-manager docs)
kubectl create secret generic dns-provider-credentials \\
--from-literal=key=<value> \\
-n cert-manager # <-- must be cert-manager namespace for ClusterIssuer
Then create the ClusterIssuer, filling in the dns01 solver block for your provider:
cat > /tmp/cert-manager-issuer.yaml <<EOFISSUER
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
# ClusterIssuer is cluster-scoped — no namespace field
spec:
acme:
server: <https://acme-v02.api.letsencrypt.org/directory>
email: your-email@example.com
privateKeySecretRef:
name: letsencrypt-prod-private-key # created automatically in cert-manager namespace
solvers:
- dns01:
# Paste your provider-specific solver block here from the cert-manager docs
# e.g. route53: {}, cloudflare: {}, azureDNS: {}, etc.
# Any secretRef here resolves to the cert-manager namespace
selector:
dnsZones:
- "${CUSTOM_DOMAIN}"
EOFISSUER
# Edit /tmp/cert-manager-issuer.yaml to fill in the dns01 solver block before applying
kubectl apply -f /tmp/cert-manager-issuer.yaml
Verify ClusterIssuer is Ready
# Check ClusterIssuer status
kubectl get clusterissuer letsencrypt-prod
# Should show READY=True
# NAME READY AGE
# letsencrypt-prod True 30s
# If not ready, check logs
kubectl logs -n cert-manager deployment/cert-manager | grep -i error
Switch Custom Domain CNAMEs to Edge-Proxy’s LB
Currently your custom domain CNAMEs route through Altinity’s DNS (env.altinity.cloud) before reaching edge-proxy’s load balancer. Before issuing the certificate you must:
- Point all custom domain CNAMEs directly at edge-proxy’s LB with TTL=60 (cutting out the Altinity DNS hop). In Step 8 you will switch these CNAMEs one final time to Contour’s LB — the low TTL means that cutover propagates in ~60 seconds.
- Remove any
_acme-challengeCNAME delegations Altinity set up for your domain. cert-manager writes TXT records into your DNS zone directly; if a CNAME delegation to Altinity’s nameservers still exists, Let’s Encrypt will follow it and not find cert-manager’s TXT record.
# Get edge-proxy's underlying LB hostname (resolves through Altinity's DNS)
export EDGE_PROXY_LB=$(dig $CUSTOM_DOMAIN +short | tail -1)
# Or look it up directly from the cluster:
# export EDGE_PROXY_LB=$(kubectl get svc -n $SYS_NAMESPACE -l app=edge-proxy \\
# -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}')
echo "Edge-proxy LB: $EDGE_PROXY_LB"
echo "export EDGE_PROXY_LB='$EDGE_PROXY_LB'" >> ~/.clickhouse-disconnect-env
# Update CNAMEs to point directly at edge-proxy's LB (example using Route53)
for SUBDOMAIN in "${CUSTOM_DOMAIN}" "internal.${CUSTOM_DOMAIN}" "vpce.${CUSTOM_DOMAIN}"; do
aws route53 change-resource-record-sets \\
--hosted-zone-id $HOSTED_ZONE_ID \\
--change-batch "{
\\"Changes\\": [{
\\"Action\\": \\"UPSERT\\",
\\"ResourceRecordSet\\": {
\\"Name\\": \\"*.${SUBDOMAIN}\\",
\\"Type\\": \\"CNAME\\",
\\"TTL\\": 60,
\\"ResourceRecords\\": [{\\"Value\\": \\"${EDGE_PROXY_LB}\\"}]
}
}]
}" && echo "Updated *.${SUBDOMAIN}" || echo "Skipped *.${SUBDOMAIN} (not in use)"
done
# Remove _acme-challenge CNAME delegations to Altinity's DNS
# Check which exist:
for SUBDOMAIN in "${CUSTOM_DOMAIN}" "internal.${CUSTOM_DOMAIN}" "vpce.${CUSTOM_DOMAIN}"; do
RESULT=$(dig "_acme-challenge.${SUBDOMAIN}" CNAME +short)
if [ -n "$RESULT" ]; then
echo "_acme-challenge.${SUBDOMAIN} → $RESULT (DELETE THIS)"
fi
done
# Delete any returned CNAMEs in your DNS provider before proceeding.
# Verify the custom domain now resolves directly to edge-proxy's LB
dig "*.${CUSTOM_DOMAIN}" +short
Request Certificate for Contour
The Certificate resource is created in the projectcontour namespace so cert-manager issues and renews the TLS Secret there directly — the Gateway references it in that same namespace. HTTPRoutes and TLSRoutes in $CH_NAMESPACE do not reference the secret; only the Gateway does.
# Create Certificate resource
# dnsNames mirrors what crtd managed: always *.CUSTOM_DOMAIN,
# plus *.internal and *.vpce if those sub-zones are in use.
cat > /tmp/contour-certificate.yaml <<EOFCERT
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: contour-tls
namespace: projectcontour
spec:
secretName: contour-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- "*.${CUSTOM_DOMAIN}"
# Uncomment if internal sub-zone is in use:
# - "*.internal.${CUSTOM_DOMAIN}"
# Uncomment if AWS PrivateLink (vpce) sub-zone is in use:
# - "*.vpce.${CUSTOM_DOMAIN}"
EOFCERT
kubectl apply -f /tmp/contour-certificate.yaml
# Watch certificate issuance (takes 30-120 seconds for DNS-01, faster for HTTP-01)
kubectl get certificate contour-tls -n projectcontour --watch
# Wait for Ready=True
kubectl wait --for=condition=Ready certificate/contour-tls \\
-n projectcontour \\
--timeout=300s
# Check certificate details
kubectl describe certificate contour-tls -n projectcontour
# Verify TLS secret was created
kubectl get secret contour-tls -n projectcontour
# Save secret name
export TLS_SECRET_NAME="contour-tls"
echo "export TLS_SECRET_NAME='$TLS_SECRET_NAME'" >> ~/.clickhouse-disconnect-env
Verify Certificate
# Check certificate SANs (Subject Alternative Names)
kubectl get secret contour-tls -n projectcontour \\
-o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
openssl x509 -noout -text | grep -A3 "Subject Alternative Name"
# Should show:
# DNS:*.clickhouse.yourdomain.com
# Check issuer (should be Let's Encrypt)
kubectl get secret contour-tls -n projectcontour \\
-o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
openssl x509 -noout -issuer
# Check expiration (90 days for Let's Encrypt)
kubectl get secret contour-tls -n projectcontour \\
-o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
openssl x509 -noout -enddate
Important: cert-manager and crtd will now run in parallel. The old crtd certificates will continue to work with edge-proxy, while new cert-manager certificates work with Contour. In Phase 3, we’ll remove crtd.
Certificate Renewal
cert-manager automatically renews certificates ~30 days before expiration:
# Check certificate renewal status
kubectl describe certificate contour-tls -n projectcontour | grep -A10 "Status"
# Monitor cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager --tail=50 -f
Step 4: Configure Gateway and Create Proxy Resources
Create Gateway
Create a Gateway with three listeners. The Gateway Provisioner provisions an Envoy deployment and AWS load balancer that expose exactly the ports defined here — no manual service patching required.
- Port 8080 (
http): plain HTTP, redirected to HTTPS by Envoy - Port 8443 (
https,HTTPSprotocol): TLS terminated by Envoy; HTTPRoute resources attach here for HTTP interface routing - Port 9440 (
clickhouse-native,TLSprotocol +Terminatemode): TLS terminated by Envoy; TLSRoute resources attach here for native protocol routing
Both the HTTPS and TLS listeners share the same wildcard TLS certificate from Step 3. The same FQDN (e.g., test-byoc-0-0.clickhouse.yourdomain.com) can appear on both listeners because they are on different ports — this is what allows a single hostname to serve HTTP on port 8443 and native protocol on port 9440, exactly as edge-proxy does.
# Create the Gateway — references the TLS cert issued directly into projectcontour namespace in Step 3
cat > /tmp/contour-gateway.yaml <<EOFGW
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: contour
namespace: projectcontour
spec:
gatewayClassName: contour
listeners:
- name: http
protocol: HTTP
port: 8080
allowedRoutes:
namespaces:
from: All
- name: https
protocol: HTTPS
port: 8443
tls:
mode: Terminate
certificateRefs:
- name: ${TLS_SECRET_NAME}
allowedRoutes:
namespaces:
from: All
- name: clickhouse-native
protocol: TLS
port: 9440
tls:
mode: Terminate
certificateRefs:
- name: ${TLS_SECRET_NAME}
allowedRoutes:
namespaces:
from: All
EOFGW
kubectl apply -f /tmp/contour-gateway.yaml
# Wait for Gateway to be accepted and programmed by the provisioner
kubectl wait --for=condition=Accepted gateway/contour -n projectcontour --timeout=300s
kubectl wait --for=condition=Programmed gateway/contour -n projectcontour --timeout=300s
# Wait for the provisioned Envoy LoadBalancer (takes 2-3 minutes)
echo "Waiting for LoadBalancer..."
sleep 120
export CONTOUR_LB=$(kubectl get gateway contour -n projectcontour \\
-o jsonpath='{.status.addresses[0].value}')
echo "Contour LoadBalancer: $CONTOUR_LB"
echo "export CONTOUR_LB='$CONTOUR_LB'" >> ~/.clickhouse-disconnect-env
Create HTTPRoute for ClickHouse HTTP Interface
For the HTTP interface (backend port 8123), edge-proxy uses tls-to-tcp — it terminates TLS and forwards plain HTTP. We replicate this with HTTPRoute resources attached to the HTTPS listener (port 8443). Envoy terminates TLS and routes by hostname to the correct ClickHouse service.
# Create HTTPRoute for each ClickHouse service
cat > /tmp/create-ch-httproutes.sh <<'EOFSCRIPT'
#!/bin/bash
# ClickHouse operator creates two kinds of services with edge-proxy annotations:
# chi-<name>-<cluster>-<shard>-<replica> — per-replica, has shard/replica labels
# clickhouse-<name> — cluster-level load balancer, no shard/replica labels
# Per-replica services get individual HTTPRoute resources.
# The cluster-level service backs the cluster-wide HTTPRoute at $CUSTOM_DOMAIN.
CLUSTER_SVC=""
CLUSTER_HTTP_PORT=""
while IFS= read -r service; do
SVC_JSON=$(kubectl get svc "$service" -n "$CH_NAMESPACE" -o json 2>/dev/null)
if [ -z "$SVC_JSON" ]; then
echo "Warning: service $service not found in $CH_NAMESPACE, skipping"
continue
fi
PORT_MAPPING=$(echo "$SVC_JSON" | jq -r '.metadata.annotations["edge-proxy.altinity.com/port-mapping"] // empty')
HTTP_PORT=$(echo "$PORT_MAPPING" | grep -oP '\\d+:tls-to-tcp:\\K\\d+' | head -1)
CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')
# Services without shard/replica labels are cluster-level — defer to cluster-wide HTTPRoute
if [ -z "$SHARD" ] || [ -z "$REPLICA" ]; then
echo "Deferring $service (cluster-level service, used for cluster-wide HTTPRoute)"
CLUSTER_SVC="$service"
CLUSTER_HTTP_PORT="$HTTP_PORT"
continue
fi
if [ -z "$HTTP_PORT" ]; then
echo "Skipping $service (no tls-to-tcp port mapping)"
continue
fi
SUBDOMAIN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"
echo "Creating HTTPRoute: $service → $SUBDOMAIN:$HTTP_PORT"
cat <<EOFROUTE | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: ${service}-http
namespace: ${CH_NAMESPACE}
spec:
parentRefs:
- name: contour
namespace: projectcontour
sectionName: https
hostnames:
- "${SUBDOMAIN}"
rules:
- timeouts:
request: 3600s
backendRequest: 3600s
backendRefs:
- name: ${service}
port: ${HTTP_PORT}
EOFROUTE
echo " ✓ Created"
done < /tmp/clickhouse-services.txt
# Cluster-wide HTTPRoute at $CUSTOM_DOMAIN, backed by the cluster-level service
if [ -n "$CLUSTER_SVC" ] && [ -n "$CLUSTER_HTTP_PORT" ]; then
echo "Creating cluster-wide HTTPRoute: $CUSTOM_DOMAIN → $CLUSTER_SVC:$CLUSTER_HTTP_PORT"
cat <<EOFCLUSTER | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: clickhouse-cluster
namespace: ${CH_NAMESPACE}
spec:
parentRefs:
- name: contour
namespace: projectcontour
sectionName: https
hostnames:
- "${CUSTOM_DOMAIN}"
rules:
- timeouts:
request: 3600s
backendRequest: 3600s
backendRefs:
- name: ${CLUSTER_SVC}
port: ${CLUSTER_HTTP_PORT}
EOFCLUSTER
echo " ✓ Created cluster-wide HTTPRoute"
else
echo "No cluster-level service found; skipping cluster-wide HTTPRoute"
fi
echo "✓ All HTTPRoute resources created"
EOFSCRIPT
chmod +x /tmp/create-ch-httproutes.sh
bash /tmp/create-ch-httproutes.sh
# Verify
kubectl get httproute -n $CH_NAMESPACE
Create TLSRoute for ClickHouse Native Protocol
For the native protocol (backend port 9000), edge-proxy uses tls-to-tcp — it terminates TLS and forwards plain TCP. We replicate this with TLSRoute resources attached to the clickhouse-native TLS listener (port 9440). Envoy terminates TLS, reads the SNI hostname to select the correct replica, and forwards plain TCP to ClickHouse port 9000.
Each TLSRoute uses the same FQDN as the corresponding HTTPRoute (e.g., test-byoc-0-0.clickhouse.yourdomain.com) — this works because they attach to different listeners (port 8443 vs port 9440). Clients connecting to hostname:8443 get HTTP; clients connecting to hostname:9440 get native protocol. No FQDN change, no client reconfiguration needed.
cat > /tmp/create-ch-tlsroutes.sh <<'EOFSCRIPT'
#!/bin/bash
CLUSTER_SVC=""
CLUSTER_NATIVE_PORT=""
while IFS= read -r service; do
SVC_JSON=$(kubectl get svc "$service" -n "$CH_NAMESPACE" -o json 2>/dev/null)
if [ -z "$SVC_JSON" ]; then
echo "Warning: service $service not found in $CH_NAMESPACE, skipping"
continue
fi
PORT_MAPPING=$(echo "$SVC_JSON" | jq -r '.metadata.annotations["edge-proxy.altinity.com/port-mapping"] // empty')
# HTTP port: first tls-to-tcp entry (e.g. 8443:tls-to-tcp:8123 → 8123)
HTTP_PORT=$(echo "$PORT_MAPPING" | grep -oP '\\d+:tls-to-tcp:\\K\\d+' | head -1)
# Native port: last tls-to-tcp entry (e.g. 9440:tls-to-tcp:9000 → 9000)
NATIVE_PORT=$(echo "$PORT_MAPPING" | grep -oP '\\d+:tls-to-tcp:\\K\\d+' | tail -1)
CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')
if [ -z "$SHARD" ] || [ -z "$REPLICA" ]; then
echo "Deferring $service (cluster-level service, used for cluster-wide TLSRoute)"
CLUSTER_SVC="$service"
CLUSTER_NATIVE_PORT="$NATIVE_PORT"
continue
fi
# Only create TLSRoute if there are two distinct tls-to-tcp entries (HTTP and native)
if [ -z "$NATIVE_PORT" ] || [ "$HTTP_PORT" = "$NATIVE_PORT" ]; then
echo "Skipping $service (no separate native protocol port mapping)"
continue
fi
FQDN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"
echo "Creating TLSRoute: $service → $FQDN (native port $NATIVE_PORT via port 9440 listener)"
cat <<EOFTLS | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
name: ${service}-native
namespace: ${CH_NAMESPACE}
spec:
parentRefs:
- name: contour
namespace: projectcontour
sectionName: clickhouse-native
hostnames:
- "${FQDN}"
rules:
- backendRefs:
- name: ${service}
port: ${NATIVE_PORT}
EOFTLS
echo " ✓ Created"
done < /tmp/clickhouse-services.txt
# Cluster-wide TLSRoute at $CUSTOM_DOMAIN, backed by the cluster-level service
if [ -n "$CLUSTER_SVC" ] && [ -n "$CLUSTER_NATIVE_PORT" ]; then
CLUSTER_HTTP_PORT=$(kubectl get svc "$CLUSTER_SVC" -n "$CH_NAMESPACE" -o json | \\
jq -r '.metadata.annotations["edge-proxy.altinity.com/port-mapping"] // empty' | \\
grep -oP '\\d+:tls-to-tcp:\\K\\d+' | head -1)
if [ "$CLUSTER_HTTP_PORT" != "$CLUSTER_NATIVE_PORT" ]; then
echo "Creating cluster-wide TLSRoute: $CUSTOM_DOMAIN → $CLUSTER_SVC:$CLUSTER_NATIVE_PORT"
cat <<EOFCLUSTER | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
name: clickhouse-cluster-native
namespace: ${CH_NAMESPACE}
spec:
parentRefs:
- name: contour
namespace: projectcontour
sectionName: clickhouse-native
hostnames:
- "${CUSTOM_DOMAIN}"
rules:
- backendRefs:
- name: ${CLUSTER_SVC}
port: ${CLUSTER_NATIVE_PORT}
EOFCLUSTER
echo " ✓ Created cluster-wide TLSRoute"
else
echo "No separate native protocol port for cluster-level service; skipping cluster-wide TLSRoute"
fi
fi
echo "✓ All TLSRoute resources created"
EOFSCRIPT
chmod +x /tmp/create-ch-tlsroutes.sh
bash /tmp/create-ch-tlsroutes.sh
# Verify
kubectl get httproute,tlsroute -n $CH_NAMESPACE
Create HTTPRoute for Monitoring Services (Prometheus, Grafana)
This is an optional step to expose monitoring services
Edge-proxy routes monitoring traffic as tcp-passthrough on specific ports (e.g. 9090:tcp-passthrough:9090). Because monitoring services use plain TCP with no TLS, there is no SNI for Contour to route on. Instead, expose monitoring services as named HTTPS subdomains: Contour terminates TLS and forwards plain HTTP to the backend.
This changes the access URL from CUSTOM_DOMAIN:9090 to https://prometheus.CUSTOM_DOMAIN (or similar). Update any dashboards or alert configurations accordingly.
if [ -s /tmp/monitoring-services.txt ]; then
cat > /tmp/create-monitoring-routes.sh <<'EOFSCRIPT'
#!/bin/bash
while IFS= read -r service; do
SVC_JSON=$(kubectl get svc "$service" -n "$SYS_NAMESPACE" -o json 2>/dev/null)
if [ -z "$SVC_JSON" ]; then
echo "Warning: service $service not found in $SYS_NAMESPACE, skipping"
continue
fi
PORT_MAPPING=$(echo "$SVC_JSON" | jq -r '.metadata.annotations["edge-proxy.altinity.com/port-mapping"] // empty')
INTERNAL_PORT=$(echo "$PORT_MAPPING" | grep -oP ':\\K\\d+$')
APP=$(echo "$SVC_JSON" | jq -r '.metadata.labels.app // empty')
if [ -z "$INTERNAL_PORT" ] || [ -z "$APP" ]; then
echo "Skipping $service (could not determine port or app label)"
continue
fi
FQDN="${APP}.${CUSTOM_DOMAIN}"
echo "Creating HTTPRoute: $service → <https://$FQDN> (port $INTERNAL_PORT)"
cat <<EOFROUTE | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: ${service}-monitoring
namespace: ${SYS_NAMESPACE}
spec:
parentRefs:
- name: contour
namespace: projectcontour
sectionName: https
hostnames:
- "${FQDN}"
rules:
- backendRefs:
- name: ${service}
port: ${INTERNAL_PORT}
EOFROUTE
echo " ✓ Created — access at <https://$FQDN>"
done < /tmp/monitoring-services.txt
echo "✓ Monitoring HTTPRoute resources created"
EOFSCRIPT
chmod +x /tmp/create-monitoring-routes.sh
bash /tmp/create-monitoring-routes.sh
else
echo "No monitoring services found, skipping..."
fi
Handle tls-passthrough Services
If Step 1 found services using tls-passthrough, those services handle TLS termination themselves (e.g., ClickHouse with TLS enabled on the native port). Contour needs a separate TLS/Passthrough listener for each external port used with tls-passthrough — this is a different listener mode from the TLS/Terminate listener used for tls-to-tcp services.
If tls-passthrough services use the same external port as tls-to-tcp services (both on port 9440), the Gateway cannot serve both modes on a single listener — move the passthrough service to a different external port or contact Altinity for guidance.
if [ -s /tmp/tls-passthrough-services.txt ]; then
# Find the unique external ports used by tls-passthrough services
PASSTHROUGH_PORTS=$(cat /tmp/tls-passthrough-services.txt | \\
grep -oP '\\d+:tls-passthrough' | cut -d: -f1 | sort -u)
echo "Adding TLS/Passthrough listeners for ports: $PASSTHROUGH_PORTS"
for PORT in $PASSTHROUGH_PORTS; do
# Check for conflict with existing listeners (port 8443 and 9440 are already defined)
if kubectl get gateway contour -n projectcontour -o json | \\
jq -e ".spec.listeners[] | select(.port == ${PORT})" > /dev/null 2>&1; then
echo "ERROR: Port $PORT already has a listener — tls-passthrough and tls-to-tcp cannot share a port."
echo " Move one to a different external port before proceeding."
continue
fi
echo "Adding Passthrough listener on port $PORT"
kubectl patch gateway contour -n projectcontour --type='json' -p="[{
\\"op\\": \\"add\\",
\\"path\\": \\"/spec/listeners/-\\",
\\"value\\": {
\\"name\\": \\"tls-passthrough-${PORT}\\",
\\"protocol\\": \\"TLS\\",
\\"port\\": ${PORT},
\\"tls\\": {\\"mode\\": \\"Passthrough\\"},
\\"allowedRoutes\\": {\\"namespaces\\": {\\"from\\": \\"All\\"}}
}
}]"
done
# Create TLSRoute for each tls-passthrough service
cat > /tmp/create-ch-passthrough-tlsroutes.sh <<'EOFSCRIPT'
#!/bin/bash
while IFS= read -r svc_line; do
# Parse: "namespace/name: mapping"
NS=$(echo "$svc_line" | cut -d/ -f1)
REST=$(echo "$svc_line" | cut -d/ -f2-)
SVC=$(echo "$REST" | cut -d: -f1)
MAPPING=$(echo "$REST" | cut -d' ' -f2-)
SVC_JSON=$(kubectl get svc "$SVC" -n "$NS" -o json 2>/dev/null)
if [ -z "$SVC_JSON" ]; then
echo "Warning: service $SVC not found in $NS, skipping"
continue
fi
CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')
if [ -z "$SHARD" ] || [ -z "$REPLICA" ]; then
echo "Skipping $SVC (cluster-level service; passthrough routing is per-replica only)"
continue
fi
FQDN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"
# Each passthrough mapping: "extport:tls-passthrough:backendport"
echo "$MAPPING" | tr ',' '\\n' | grep 'tls-passthrough' | while IFS=: read -r EXT_PORT MODE BACKEND_PORT; do
echo "Creating TLSRoute (passthrough): $SVC → $FQDN ext=$EXT_PORT backend=$BACKEND_PORT"
cat <<EOFTLS | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
name: ${SVC}-passthrough-${EXT_PORT}
namespace: ${NS}
spec:
parentRefs:
- name: contour
namespace: projectcontour
sectionName: tls-passthrough-${EXT_PORT}
hostnames:
- "${FQDN}"
rules:
- backendRefs:
- name: ${SVC}
port: ${BACKEND_PORT}
EOFTLS
echo " ✓ Created"
done
done < /tmp/tls-passthrough-services.txt
echo "✓ Passthrough TLSRoute resources created"
EOFSCRIPT
chmod +x /tmp/create-ch-passthrough-tlsroutes.sh
bash /tmp/create-ch-passthrough-tlsroutes.sh
kubectl get tlsroute -A | grep passthrough
else
echo "No tls-passthrough services found, skipping."
fi
Handle IP Whitelisting
The edge-proxy.altinity.com/whitelist annotation restricts which source IPs can connect. In Contour Gateway API mode, per-route IP filtering is not available as a standard feature. The recommended approach is to enforce IP restrictions at the AWS NLB security group — this blocks disallowed traffic before it reaches Envoy and is the most reliable option.
if [ -s /tmp/whitelist-services.txt ]; then
echo "=== Whitelist CIDRs found ==="
# Collect all unique CIDRs across all whitelisted services
ALL_CIDRS=$(cat /tmp/whitelist-services.txt | \\
grep -oP '[\\d./]+' | sort -u | tr '\\n' ',')
echo "CIDRs to allow: $ALL_CIDRS"
# Get the security group attached to Contour's Envoy NLB
CONTOUR_SG=$(aws ec2 describe-security-groups \\
--filters "Name=tag:kubernetes.io/service-name,Values=projectcontour/envoy" \\
--query 'SecurityGroups[0].GroupId' --output text 2>/dev/null)
if [ -n "$CONTOUR_SG" ] && [ "$CONTOUR_SG" != "None" ]; then
echo "Contour NLB security group: $CONTOUR_SG"
echo ""
echo "Add inbound rules to $CONTOUR_SG for the following CIDRs on ports 8443 and 9440:"
echo "$ALL_CIDRS" | tr ',' '\\n' | grep -v '^$' | while read -r CIDR; do
echo " aws ec2 authorize-security-group-ingress --group-id $CONTOUR_SG \\\\"
echo " --protocol tcp --port 8443 --cidr $CIDR"
echo " aws ec2 authorize-security-group-ingress --group-id $CONTOUR_SG \\\\"
echo " --protocol tcp --port 9440 --cidr $CIDR"
done
echo ""
echo "Review per-service whitelists in /tmp/whitelist-services.txt."
echo "If different services have different CIDRs, use separate NLB listeners or"
echo "implement a Kubernetes NetworkPolicy on the ClickHouse pods for finer-grained control."
else
echo "Could not auto-detect Contour NLB security group. Find it manually:"
echo " AWS Console → EC2 → Load Balancers → find the Contour NLB → Security tab"
echo "Then add inbound rules for the CIDRs listed above."
fi
fi
Handle Upstream TLS (tls-to-tls / tls-to-tls-insecure)
If Step 1 found services using tls-to-tls or tls-to-tls-insecure, edge-proxy was terminating the client TLS and then opening a new TLS connection to the upstream (ClickHouse with TLS on the pod). In Contour Gateway API, upstream TLS is configured via the experimental BackendTLSPolicy resource.
if [ -s /tmp/tls-to-tls-services.txt ]; then
echo "Services requiring upstream TLS:"
cat /tmp/tls-to-tls-services.txt
echo ""
echo "For each service, create a BackendTLSPolicy. Example for service chi-NAME-0-0:"
cat <<'EOFEXAMPLE'
apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTLSPolicy
metadata:
name: chi-NAME-0-0-upstream-tls
namespace: $CH_NAMESPACE
spec:
targetRefs:
- group: ""
kind: Service
name: chi-NAME-0-0 # must match the service name in the HTTPRoute backendRef
validation:
# For tls-to-tls: specify the CA cert that signed the upstream cert
# Create a ConfigMap with the CA cert first, then reference it here:
caCertificateRefs:
- group: ""
kind: ConfigMap
name: clickhouse-upstream-ca
hostname: chi-NAME-0-0.internal.clickhouse.svc # upstream TLS server name
# For tls-to-tls-insecure: use wellKnownCACertificates instead and set
# hostname to match the upstream cert's CN/SAN. Or for fully insecure,
# this feature is not yet supported in Gateway API — use Contour HTTPProxy
# with spec.routes[].services[].protocol: tls instead.
EOFEXAMPLE
echo ""
echo "See: <https://gateway-api.sigs.k8s.io/api-types/backendtlspolicy/>"
echo "Note: BackendTLSPolicy requires Gateway API experimental channel (already installed in Step 2)."
fi
Handle Zone-Aware Routing
Edge-proxy supports AZ-local routing: each edge-proxy instance routes requests to services tagged with a matching edge-proxy.altinity.com/zone annotation. This prevents cross-AZ traffic between the proxy and ClickHouse pods.
With Contour, this per-AZ routing is not replicated. A single Contour LB accepts all traffic and routes to pods in any AZ based on the FQDN. Cross-AZ traffic between Envoy and ClickHouse pods may increase, which can raise AWS data transfer costs.
if [ -s /tmp/zone-services.txt ]; then
echo "Zone-annotated services:"
cat /tmp/zone-services.txt
echo ""
echo "Mitigations:"
echo ""
echo "1. Enable topology-aware routing on ClickHouse services (Kubernetes 1.27+)."
echo " This tells kube-proxy to prefer endpoints in the same AZ as the Envoy pod:"
while IFS= read -r line; do
SVC=$(echo "$line" | awk '{print $1}' | tr -d '(zone:')
NS=$(echo "$SVC" | cut -d/ -f1)
NAME=$(echo "$SVC" | cut -d/ -f2)
[ -n "$NAME" ] && echo " kubectl annotate svc $NAME -n $NS service.kubernetes.io/topology-mode=Auto"
done < /tmp/zone-services.txt
echo ""
echo "2. Use per-replica FQDNs (e.g. test-byoc-0-0.custom.domain) instead of the"
echo " cluster-level FQDN to ensure clients connect to a specific replica."
echo " Per-replica routing is already configured in the HTTPRoute/TLSRoute scripts above."
fi
Step 5: Test Contour (Before DNS Change)
HTTP interface tests use --connect-to to direct traffic to the Contour LB while using the real service FQDN in the URL. This sets both the TLS SNI (required for the Gateway to match the cert) and the HTTP Host header (required for HTTPRoute matching). --insecure skips cert hostname validation since the cert is for *.${CUSTOM_DOMAIN}, not the LB hostname. (--resolve looks similar but requires an IP; --connect-to accepts ELB DNS names directly.)
Native protocol (port 9440) cannot be tested pre-DNS: clickhouse-client uses --host as both the TCP destination and the TLS SNI with no way to override them independently. Native protocol is verified in Step 7 immediately after the 60-second DNS cutover.
# Pick the first per-replica ClickHouse service (cluster-level services have no shard/replica labels)
SUBDOMAIN=""
while IFS= read -r service; do
SVC_JSON=$(kubectl get svc "$service" -n "$CH_NAMESPACE" -o json 2>/dev/null)
CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')
if [ -n "$SHARD" ] && [ -n "$REPLICA" ]; then
SUBDOMAIN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"
break
fi
done < /tmp/clickhouse-services.txt
if [ -z "$SUBDOMAIN" ]; then echo "ERROR: no per-replica service found in /tmp/clickhouse-services.txt"; exit 1; fi
# --connect-to HOST:PORT:CONNECT-HOST:CONNECT-PORT routes TCP to $CONTOUR_LB while
# keeping $SUBDOMAIN as the URL hostname → curl sets SNI=$SUBDOMAIN and Host: $SUBDOMAIN.
echo "Test 1: ClickHouse HTTP via Contour (unauthenticated)"
curl --insecure \\
--connect-to "${SUBDOMAIN}:8443:${CONTOUR_LB}:8443" \\
"<https://$>{SUBDOMAIN}:8443/?query=SELECT%20version()"
echo "Test 2: HTTP with authentication"
curl --insecure \\
--connect-to "${SUBDOMAIN}:8443:${CONTOUR_LB}:8443" \\
"<https://$>{SUBDOMAIN}:8443/?query=SELECT%20hostName()" \\
--user $CH_USER:$CH_PASSWORD
# Note: native protocol (port 9440) cannot be reliably tested before DNS cutover.
# clickhouse-client uses --host as both the TCP destination and the TLS SNI — there is
# no separate SNI override. Connecting by IP sends no SNI, and /etc/hosts is not
# consistently honoured by the client. Since TTL is already 60 s, verify native
# protocol immediately after the DNS cutover in Step 6 instead.
echo "Skipping native protocol pre-DNS test — will verify in Step 7 after DNS cutover."
# Test 4: Monitoring services (if exist)
if [ -s /tmp/monitoring-services.txt ]; then
echo "Test 4: Prometheus"
curl --insecure \\
--connect-to "prometheus.${CUSTOM_DOMAIN}:8443:${CONTOUR_LB}:8443" \\
"<https://prometheus.$>{CUSTOM_DOMAIN}/-/healthy" || true
fi
echo "✓ Pre-DNS tests complete"
Step 6: Switch Traffic to Contour (LB Cutover)
DNS currently points to edge-proxy’s LB ($EDGE_PROXY_LB) with TTL=60, set during the pre-flight step. Update the CNAMEs to $CONTOUR_LB to cut live traffic over to Contour. Because TTL is already 60 seconds, propagation takes approximately one minute with no extended downtime window.
# Confirm current DNS is pointing at edge-proxy's LB and TTL is low
dig $CUSTOM_DOMAIN
# Switch all custom domain CNAMEs from edge-proxy's LB to Contour's LB
for SUBDOMAIN in "${CUSTOM_DOMAIN}" "internal.${CUSTOM_DOMAIN}" "vpce.${CUSTOM_DOMAIN}"; do
aws route53 change-resource-record-sets \\
--hosted-zone-id $HOSTED_ZONE_ID \\
--change-batch "{
\\"Changes\\": [{
\\"Action\\": \\"UPSERT\\",
\\"ResourceRecordSet\\": {
\\"Name\\": \\"*.${SUBDOMAIN}\\",
\\"Type\\": \\"CNAME\\",
\\"TTL\\": 300,
\\"ResourceRecords\\": [{\\"Value\\": \\"${CONTOUR_LB}\\"}]
}
}]
}" && echo "Switched *.${SUBDOMAIN} → $CONTOUR_LB" || echo "Skipped *.${SUBDOMAIN} (not in use)"
done
echo "Waiting 60s for DNS propagation..."
sleep 60
# Verify all zones now resolve to Contour's LB
dig "*.${CUSTOM_DOMAIN}" +short
dig "*.internal.${CUSTOM_DOMAIN}" +short
dig "*.vpce.${CUSTOM_DOMAIN}" +short
Step 7: Test Production Endpoints
# Test each ClickHouse service by its individual FQDN (routes exist per-service, not on $CUSTOM_DOMAIN)
while IFS= read -r service; do
SVC_JSON=$(kubectl get svc "$service" -n "$CH_NAMESPACE" -o json 2>/dev/null)
if [ -z "$SVC_JSON" ]; then continue; fi
CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')
# Skip cluster-level services (no shard/replica labels) — they have no individual route
if [ -z "$SHARD" ] || [ -z "$REPLICA" ]; then
echo "Skipping $service (cluster-level, no per-replica route)"
continue
fi
SUBDOMAIN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"
echo "Test HTTP: $SUBDOMAIN"
curl "<https://$>{SUBDOMAIN}:8443/?query=SELECT%20hostName()" \\
--user $CH_USER:$CH_PASSWORD
echo "Test native: $SUBDOMAIN port 9440"
clickhouse-client --host="$SUBDOMAIN" --port=9440 --secure \\
--user="$CH_USER" --password="$CH_PASSWORD" \\
--query "SELECT version(), hostName()"
done < /tmp/clickhouse-services.txt
# Test monitoring (now served as HTTPS subdomains)
if [ -s /tmp/monitoring-services.txt ]; then
echo "Test: Prometheus"
curl "<https://prometheus.$>{CUSTOM_DOMAIN}/-/healthy"
echo "Test: Grafana"
curl "<https://grafana.$>{CUSTOM_DOMAIN}/api/health"
fi
echo "✓ All production endpoints working"
Step 8: Monitor for 48 Hours
# Application health
kubectl logs -f deployment/your-app -n your-app-namespace | grep -i clickhouse
# Contour health
kubectl get pods -n projectcontour
kubectl get gateway -n projectcontour
kubectl get httproute,tlsroute -n $CH_NAMESPACE
# ClickHouse health
clickhouse-client --host $CUSTOM_DOMAIN --port 9440 --secure \\
--user $CH_USER --password $CH_PASSWORD \\
--query "SELECT event_time, user, exception
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
AND event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC LIMIT 20"
# Check query performance
clickhouse-client --host $CUSTOM_DOMAIN --port 9440 --secure \\
--user $CH_USER --password $CH_PASSWORD \\
--query "SELECT
quantile(0.5)(query_duration_ms) as p50_ms,
quantile(0.95)(query_duration_ms) as p95_ms,
quantile(0.99)(query_duration_ms) as p99_ms
FROM system.query_log
WHERE event_time > now() - INTERVAL 1 HOUR
AND type = 'QueryFinish'"
Rollback Procedure
Roll back by pointing CNAMEs back to edge-proxy’s LB. edge-proxy remains running until the last step so we can seamlessly switch traffic over.
for SUBDOMAIN in "${CUSTOM_DOMAIN}" "internal.${CUSTOM_DOMAIN}" "vpce.${CUSTOM_DOMAIN}"; do
aws route53 change-resource-record-sets \\
--hosted-zone-id $HOSTED_ZONE_ID \\
--change-batch "{
\\"Changes\\": [{
\\"Action\\": \\"UPSERT\\",
\\"ResourceRecordSet\\": {
\\"Name\\": \\"*.${SUBDOMAIN}\\",
\\"Type\\": \\"CNAME\\",
\\"TTL\\": 60,
\\"ResourceRecords\\": [{\\"Value\\": \\"${EDGE_PROXY_LB}\\"}]
}
}]
}" && echo "Rolled back *.${SUBDOMAIN} → $EDGE_PROXY_LB"
done
echo "✓ DNS rolled back to edge-proxy"
Troubleshooting
Native Protocol Not Working
# Check TLSRoute status
kubectl describe tlsroute -n $CH_NAMESPACE
# Verify Gateway has the clickhouse-native listener programmed
kubectl describe gateway contour -n projectcontour | grep -A5 "clickhouse-native"
# Verify Envoy service exposes port 9440
kubectl get svc -n projectcontour -o yaml | grep "port: 9440"
# Check Envoy logs
kubectl logs -n projectcontour daemonset/envoy | grep -i "9440\\|tlsroute\\|error"
HTTPRoute or TLSRoute Shows Invalid
kubectl describe httproute -n $CH_NAMESPACE
kubectl describe tlsroute -n $CH_NAMESPACE
# Common issues:
# - Gateway sectionName mismatch (must match listener name: 'https' or 'clickhouse-native')
# - Service not found in the same namespace as the route
# - Port mismatch between route backendRef and service port
Monitoring Services Not Accessible
# Check HTTPRoute status for monitoring
kubectl describe httproute -n $SYS_NAMESPACE
# Test from within cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \\
curl http://prometheus-service.$SYS_NAMESPACE.svc.cluster.local:9090/-/healthy
Verification Checklist
- □ All services with edge-proxy annotations discovered
- □ Gateway API CRDs and Contour Gateway Provisioner installed; GatewayClass created
- □ cert-manager installed; ClusterIssuer ready
- □ Custom domain CNAMEs pointed at
$EDGE_PROXY_LBwith TTL=60 - □
_acme-challengeCNAME delegations to Altinity’s DNS removed - □ TLS certificate issued by Let’s Encrypt in
projectcontournamespace; SANs correct - □ Gateway created with
https(port 8443) andclickhouse-native(port 9440) listeners;$CONTOUR_LBsaved - □ HTTPRoute created for each ClickHouse service (HTTP interface, port 8443)
- □ TLSRoute created for each ClickHouse service (native protocol, port 9440, same FQDN as HTTPRoute)
- □ HTTPRoute configured for monitoring services (if applicable)
- □ (if applicable)
tls-passthroughservices: Passthrough listener(s) added to Gateway and TLSRoutes created - □ (if applicable) IP whitelisting: AWS NLB security group rules applied for all
whitelistCIDRs - □ (if applicable) Upstream TLS (
tls-to-tls): BackendTLSPolicy configured for each affected service - □ (if applicable) Zone-aware routing:
topology-mode=Autoannotation applied to per-replica services (or limitation documented) - □ HTTP and native protocol tests passing via
$CONTOUR_LB(pre-cutover) - □ CNAMEs switched to
$CONTOUR_LB(Step 6 LB cutover complete) - □ Production endpoints working via custom domain
- □ Individual service subdomains working (HTTP on port 8443, native on port 9440)
- □ Monitoring services accessible
- □ No application errors
- □ Query performance acceptable
- □ Stable for 48+ hours
Contour Deployment Complete? Monitor your deployment for at least 48 hours, then feel free to Remove Altinity Components from your cluster.
Issues? Use rollback procedure or contact Altinity support.