Replacing Edge-Proxy

Replacing Edge-Proxy

We’ll need to deploy the Contour (Envoy-based) ingress controller to replace Altinity’s edge-proxy functionality. Edge-proxy dynamically discovers services by watching Service annotations. We’ll replicate this behavior with Contour.

Understanding Edge-Proxy Service Discovery

Edge-proxy watches for Services with these annotations:

  • edge-proxy.altinity.com/port-mapping - Comma-separated externalPort:mode:backendPort tuples:
    • tls-to-tcp - Terminate TLS, forward plain TCP upstream (HTTP interface and native protocol)
    • tls-to-tls - Terminate TLS, open new TLS connection upstream (upstream cert verified)
    • tls-to-tls-insecure - Terminate TLS, open new TLS connection upstream (no cert verification)
    • tls-passthrough - Relay TLS as-is; upstream terminates TLS
    • tcp-passthrough - Plain TCP relay, no TLS (Prometheus, Grafana)
  • edge-proxy.altinity.com/tls-server-name - SNI hostname(s) edge-proxy routes on
  • edge-proxy.altinity.com/whitelist - Comma-separated source CIDRs to allow; all others denied
  • edge-proxy.altinity.com/zone-routed-tls-server-name - Like tls-server-name but with AZ-local routing
  • edge-proxy.altinity.com/zone - AZ tag on the Service; edge-proxy instances route to matching-zone services

Example ClickHouse Service (both HTTP and native protocol use tls-to-tcp):

annotations:
  edge-proxy.altinity.com/port-mapping: "8443:tls-to-tcp:8123,9440:tls-to-tcp:9000"
  edge-proxy.altinity.com/tls-server-name: "chi-0-0.internal.env.altinity.cloud"

Example Prometheus/Grafana Service:

annotations:
  edge-proxy.altinity.com/port-mapping: "9090:tcp-passthrough:9090"
  # No tls-server-name needed for tcp-passthrough

Prerequisites

Custom Domain and DNS Configuration

Your custom domain currently points to Altinity’s DNS, which routes to edge-proxy load balancers:

Current DNS Flow:

clickhouse.yourdomain.com (your Route53)
  → env.altinity.cloud (Altinity DNS)
    → edge-proxy-xyz.elb.amazonaws.com (Altinity LB)
      → edge-proxy (TLS termination / passthrough)
        → Cluster pods

After Disconnection:

clickhouse.yourdomain.com (your Route53)
  → contour-xyz.elb.amazonaws.com (Contour's LB)
    → Contour/Envoy (TLS termination / TCP passthrough)
      → Cluster pods
# Load environment variables
source ~/.clickhouse-disconnect-env

export CUSTOM_DOMAIN="clickhouse.yourdomain.com"
export ROOT_DOMAIN=$(echo $CUSTOM_DOMAIN | awk -F. '{print $(NF-1)"."$NF}')

# Get Route53 hosted zone ID for $CUSTOM_DOMAIN
export HOSTED_ZONE_ID=$(aws route53 list-hosted-zones \\
    --query "HostedZones[?Name=='${ROOT_DOMAIN}.'].Id" \\
    --output text | cut -d/ -f3)

# Verify current DNS chain
dig $CUSTOM_DOMAIN +short
# Should show: chi-name.env.altinity.cloud → then altinity ELB

echo "export CUSTOM_DOMAIN='$CUSTOM_DOMAIN'" >> ~/.clickhouse-disconnect-env
echo "export ROOT_DOMAIN='$ROOT_DOMAIN'" >> ~/.clickhouse-disconnect-env
echo "export HOSTED_ZONE_ID='$HOSTED_ZONE_ID'" >> ~/.clickhouse-disconnect-env

Strategy

  1. Find all Services with edge-proxy annotations (ClickHouse, Prometheus, Grafana)
  2. Install Contour Gateway Provisioner and Gateway API CRDs; create GatewayClass
  3. Install cert-manager; pre-flight: switch custom domain CNAMEs to edge-proxy’s LB with TTL=60; issue new TLS certificate
  4. Create Gateway with HTTPS (port 8443) and TLS (port 9440) listeners; create HTTPRoute (HTTP interface) and TLSRoute (native protocol) — both using the same FQDN on different listeners
  5. Test Contour via its own LB before cutting over traffic
  6. LB cutover: update CNAMEs from edge-proxy’s LB to Contour’s LB (~60s propagation due to low TTL)
  7. Verify production endpoints
  8. Monitor for 48 hours

Step 1: Discover All Services with Edge-Proxy Annotations

# Full summary: all services with edge-proxy annotations, grouped by connection type
kubectl get svc -A -o json | jq -r '
  .items[] | select(.metadata.annotations["edge-proxy.altinity.com/port-mapping"] != null) | {
    namespace: .metadata.namespace,
    name: .metadata.name,
    port_mapping: .metadata.annotations["edge-proxy.altinity.com/port-mapping"],
    whitelist: (.metadata.annotations["edge-proxy.altinity.com/whitelist"] // ""),
    zone_routed: (.metadata.annotations["edge-proxy.altinity.com/zone-routed-tls-server-name"] // ""),
    zone: (.metadata.annotations["edge-proxy.altinity.com/zone"] // "")
  }'

# ClickHouse services (used in scripts below)
kubectl get svc -n $CH_NAMESPACE \\
    -o json | jq -r '.items[] | select(.metadata.annotations["edge-proxy.altinity.com/port-mapping"] != null) | .metadata.name' \\
    > /tmp/clickhouse-services.txt

echo "=== ClickHouse services ==="
cat /tmp/clickhouse-services.txt

# Flag features that need extra handling — read these output sections carefully
echo ""
echo "=== tls-passthrough services (need Passthrough listener in Gateway) ==="
kubectl get svc -A -o json | jq -r '
  .items[] | select(.metadata.annotations["edge-proxy.altinity.com/port-mapping"] |
    select(. != null) | test("tls-passthrough")) |
  .metadata.namespace + "/" + .metadata.name + ": " +
  .metadata.annotations["edge-proxy.altinity.com/port-mapping"]' \\
  | tee /tmp/tls-passthrough-services.txt
[ -s /tmp/tls-passthrough-services.txt ] && echo "ACTION REQUIRED: see 'Handle tls-passthrough' in Step 4" || echo "(none)"

echo ""
echo "=== tls-to-tls / tls-to-tls-insecure services (upstream TLS — extra config needed) ==="
kubectl get svc -A -o json | jq -r '
  .items[] | select(.metadata.annotations["edge-proxy.altinity.com/port-mapping"] |
    select(. != null) | test("tls-to-tls")) |
  .metadata.namespace + "/" + .metadata.name + ": " +
  .metadata.annotations["edge-proxy.altinity.com/port-mapping"]' \\
  | tee /tmp/tls-to-tls-services.txt
[ -s /tmp/tls-to-tls-services.txt ] && echo "ACTION REQUIRED: see 'Handle Upstream TLS' in Step 4" || echo "(none)"

echo ""
echo "=== Services with IP whitelists ==="
kubectl get svc -A -o json | jq -r '
  .items[] | select(.metadata.annotations["edge-proxy.altinity.com/whitelist"] |
    select(. != null) | length > 0) |
  .metadata.namespace + "/" + .metadata.name + ": " +
  .metadata.annotations["edge-proxy.altinity.com/whitelist"]' \\
  | tee /tmp/whitelist-services.txt
[ -s /tmp/whitelist-services.txt ] && echo "ACTION REQUIRED: see 'Handle IP Whitelisting' in Step 4" || echo "(none)"

echo ""
echo "=== Zone-routed services (cross-AZ traffic may increase — review topology hints) ==="
kubectl get svc -A -o json | jq -r '
  .items[] | select(.metadata.annotations["edge-proxy.altinity.com/zone-routed-tls-server-name"] |
    select(. != null) | length > 0) |
  .metadata.namespace + "/" + .metadata.name +
  " (zone: " + (.metadata.annotations["edge-proxy.altinity.com/zone"] // "unset") + ")"' \\
  | tee /tmp/zone-services.txt
[ -s /tmp/zone-services.txt ] && echo "ACTION REQUIRED: see 'Handle Zone-Aware Routing' in Step 4" || echo "(none)"

Step 2: Install Contour

Contour’s Envoy is provisioned by the Gateway Provisioner, which creates its own AWS load balancer ($CONTOUR_LB). This LB is used only for pre-cutover testing in Step 5. Live traffic continues through edge-proxy’s LB until the explicit cutover in Step 6. The Gateway resource (which tells Contour which ports to expose) is created in Step 4, after the TLS certificate is ready.

# Install Gateway API CRDs — experimental channel is required for TLSRoute support
kubectl apply -f <https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.1/experimental-install.yaml>

# Install Contour Gateway Provisioner
kubectl apply -f <https://projectcontour.io/quickstart/contour-gateway-provisioner.yaml>

# Wait for provisioner to be ready
kubectl wait --for=condition=ready pod \\
    -l control-plane=contour-gateway-provisioner \\
    -n projectcontour \\
    --timeout=300s

# Create GatewayClass — tells Contour to manage Gateway resources
kubectl apply -f - <<'EOF'
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: contour
spec:
  controllerName: projectcontour.io/gateway-controller
EOF

Step 3: Install cert-manager to Replace crtd

Altinity’s crtd is proprietary and will be removed during the last step. We need to install the open-source cert-manager as a replacement.

Install cert-manager

# Install cert-manager using official manifests
kubectl apply -f <https://github.com/cert-manager/cert-manager/releases/download/v1.14.1/cert-manager.yaml>

# Wait for cert-manager pods to be ready
kubectl wait --for=condition=ready pod \\
    -l app.kubernetes.io/instance=cert-manager \\
    -n cert-manager \\
    --timeout=300s

# Verify installation
kubectl get pods -n cert-manager

# Expected pods:
# cert-manager-<hash>
# cert-manager-cainjector-<hash>
# cert-manager-webhook-<hash>

Understand How crtd Manages Certificates

crtd does not store ACME credentials as Kubernetes Secrets. It authenticates to Altinity’s CA (ca.altinity.cloud) via a short-lived JWT token. Altinity’s CA internally uses ACME (ZeroSSL primary, Let’s Encrypt fallback) with Altinity’s own DNS credentials to solve DNS-01 challenges — so there is nothing for you to migrate.

How the CA issues certificates:

  1. crtd generates a CSR (containing all your DNS SANs) and POSTs it to ca.altinity.cloud/sign with a JWT Bearer token
  2. Altinity’s CA validates the JWT (audience = Altinity domain, subject = your environment name)
  3. The CA then runs an ACME DNS-01 challenge against your domain using Altinity’s Route53/GCP DNS access to prove ownership
  4. Once the ACME provider (ZeroSSL or Let’s Encrypt) issues the cert, it’s returned to crtd and written into Kubernetes Secrets

Result: Production certs are publicly trusted (chaining to ZeroSSL or Let’s Encrypt). Dev environments (dev.altinity.cloud) use a self-signed local CA instead.

crtd’s certificate storage layout:

Resource Kind Namespace Content
crtd Secret $SYS_NAMESPACE ca.token — JWT for Altinity’s CA signing endpoint
crtd ConfigMap $SYS_NAMESPACE crtd.conf — issuer/cert config
edge-proxy-default-tls-secret Secret $SYS_NAMESPACE Live TLS cert+key for edge-proxy
clickhouse-default-tls-secret Secret ClickHouse namespace Live TLS cert+key for ClickHouse
# Verify crtd is running
kubectl get pods -n $SYS_NAMESPACE | grep crtd

# Inspect crtd configuration — shows issuer type and DNS names being managed
kubectl get configmap crtd -n $SYS_NAMESPACE -o jsonpath='{.data.crtd\\.conf}'

# Check the issuer of the live cert (production: ZeroSSL/Let's Encrypt; dev: CN=ca self-signed)
kubectl get secret edge-proxy-default-tls-secret -n $SYS_NAMESPACE \\
    -o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
    openssl x509 -noout -issuer -dates

# Extract the DNS SANs to replicate in cert-manager
kubectl get secret edge-proxy-default-tls-secret -n $SYS_NAMESPACE \\
    -o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
    openssl x509 -noout -text | grep -A3 "Subject Alternative Name"

Important: Altinity’s ACME account keys and DNS credentials are proprietary — there is nothing to migrate. You will set up cert-manager with your own ACME account and DNS credentials from scratch. For custom domains, Altinity’s CA was solving the DNS-01 challenge via a delegated CNAME from _acme-challenge.yourcustom.domain to Altinity’s DNS — you will now own that challenge directly.

Configure DNS-01 Provider and Create ClusterIssuer

cert-manager supports many DNS providers for automated DNS-01 challenges. For the full provider list and configuration examples (Route53, Cloudflare, Google Cloud DNS, AzureDNS, and more), see:

Namespace rules for cert-manager resources:

Resource Namespace Notes
ClusterIssuer cluster-scoped (none) Applies across all namespaces
DNS credential Secrets cert-manager Required — ClusterIssuer looks for referenced Secrets only in the cert-manager namespace
Certificate $CH_NAMESPACE The TLS Secret is created in the same namespace as the Certificate
TLS Secret (output) $CH_NAMESPACE Contour reads it from projectcontour, so it gets copied there

Create any DNS provider credential Secret in the cert-manager namespace before applying the ClusterIssuer:

# Example — replace with your provider's secret structure (see cert-manager docs)
kubectl create secret generic dns-provider-credentials \\
    --from-literal=key=<value> \\
    -n cert-manager   # <-- must be cert-manager namespace for ClusterIssuer

Then create the ClusterIssuer, filling in the dns01 solver block for your provider:

cat > /tmp/cert-manager-issuer.yaml <<EOFISSUER
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
  # ClusterIssuer is cluster-scoped — no namespace field
spec:
  acme:
    server: <https://acme-v02.api.letsencrypt.org/directory>
    email: your-email@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-private-key  # created automatically in cert-manager namespace
    solvers:
    - dns01:
        # Paste your provider-specific solver block here from the cert-manager docs
        # e.g. route53: {}, cloudflare: {}, azureDNS: {}, etc.
        # Any secretRef here resolves to the cert-manager namespace
      selector:
        dnsZones:
        - "${CUSTOM_DOMAIN}"
EOFISSUER

# Edit /tmp/cert-manager-issuer.yaml to fill in the dns01 solver block before applying
kubectl apply -f /tmp/cert-manager-issuer.yaml

Verify ClusterIssuer is Ready

# Check ClusterIssuer status
kubectl get clusterissuer letsencrypt-prod

# Should show READY=True
# NAME               READY   AGE
# letsencrypt-prod   True    30s

# If not ready, check logs
kubectl logs -n cert-manager deployment/cert-manager | grep -i error

Switch Custom Domain CNAMEs to Edge-Proxy’s LB

Currently your custom domain CNAMEs route through Altinity’s DNS (env.altinity.cloud) before reaching edge-proxy’s load balancer. Before issuing the certificate you must:

  1. Point all custom domain CNAMEs directly at edge-proxy’s LB with TTL=60 (cutting out the Altinity DNS hop). In Step 8 you will switch these CNAMEs one final time to Contour’s LB — the low TTL means that cutover propagates in ~60 seconds.
  2. Remove any _acme-challenge CNAME delegations Altinity set up for your domain. cert-manager writes TXT records into your DNS zone directly; if a CNAME delegation to Altinity’s nameservers still exists, Let’s Encrypt will follow it and not find cert-manager’s TXT record.
# Get edge-proxy's underlying LB hostname (resolves through Altinity's DNS)
export EDGE_PROXY_LB=$(dig $CUSTOM_DOMAIN +short | tail -1)
# Or look it up directly from the cluster:
# export EDGE_PROXY_LB=$(kubectl get svc -n $SYS_NAMESPACE -l app=edge-proxy \\
#     -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}')
echo "Edge-proxy LB: $EDGE_PROXY_LB"
echo "export EDGE_PROXY_LB='$EDGE_PROXY_LB'" >> ~/.clickhouse-disconnect-env

# Update CNAMEs to point directly at edge-proxy's LB (example using Route53)
for SUBDOMAIN in "${CUSTOM_DOMAIN}" "internal.${CUSTOM_DOMAIN}" "vpce.${CUSTOM_DOMAIN}"; do
    aws route53 change-resource-record-sets \\
        --hosted-zone-id $HOSTED_ZONE_ID \\
        --change-batch "{
          \\"Changes\\": [{
            \\"Action\\": \\"UPSERT\\",
            \\"ResourceRecordSet\\": {
              \\"Name\\": \\"*.${SUBDOMAIN}\\",
              \\"Type\\": \\"CNAME\\",
              \\"TTL\\": 60,
              \\"ResourceRecords\\": [{\\"Value\\": \\"${EDGE_PROXY_LB}\\"}]
            }
          }]
        }" && echo "Updated *.${SUBDOMAIN}" || echo "Skipped *.${SUBDOMAIN} (not in use)"
done

# Remove _acme-challenge CNAME delegations to Altinity's DNS
# Check which exist:
for SUBDOMAIN in "${CUSTOM_DOMAIN}" "internal.${CUSTOM_DOMAIN}" "vpce.${CUSTOM_DOMAIN}"; do
    RESULT=$(dig "_acme-challenge.${SUBDOMAIN}" CNAME +short)
    if [ -n "$RESULT" ]; then
        echo "_acme-challenge.${SUBDOMAIN}$RESULT  (DELETE THIS)"
    fi
done
# Delete any returned CNAMEs in your DNS provider before proceeding.

# Verify the custom domain now resolves directly to edge-proxy's LB
dig "*.${CUSTOM_DOMAIN}" +short

Request Certificate for Contour

The Certificate resource is created in the projectcontour namespace so cert-manager issues and renews the TLS Secret there directly — the Gateway references it in that same namespace. HTTPRoutes and TLSRoutes in $CH_NAMESPACE do not reference the secret; only the Gateway does.

# Create Certificate resource
# dnsNames mirrors what crtd managed: always *.CUSTOM_DOMAIN,
# plus *.internal and *.vpce if those sub-zones are in use.
cat > /tmp/contour-certificate.yaml <<EOFCERT
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: contour-tls
  namespace: projectcontour
spec:
  secretName: contour-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - "*.${CUSTOM_DOMAIN}"
  # Uncomment if internal sub-zone is in use:
  # - "*.internal.${CUSTOM_DOMAIN}"
  # Uncomment if AWS PrivateLink (vpce) sub-zone is in use:
  # - "*.vpce.${CUSTOM_DOMAIN}"
EOFCERT

kubectl apply -f /tmp/contour-certificate.yaml

# Watch certificate issuance (takes 30-120 seconds for DNS-01, faster for HTTP-01)
kubectl get certificate contour-tls -n projectcontour --watch

# Wait for Ready=True
kubectl wait --for=condition=Ready certificate/contour-tls \\
    -n projectcontour \\
    --timeout=300s

# Check certificate details
kubectl describe certificate contour-tls -n projectcontour

# Verify TLS secret was created
kubectl get secret contour-tls -n projectcontour

# Save secret name
export TLS_SECRET_NAME="contour-tls"
echo "export TLS_SECRET_NAME='$TLS_SECRET_NAME'" >> ~/.clickhouse-disconnect-env

Verify Certificate

# Check certificate SANs (Subject Alternative Names)
kubectl get secret contour-tls -n projectcontour \\
    -o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
    openssl x509 -noout -text | grep -A3 "Subject Alternative Name"

# Should show:
# DNS:*.clickhouse.yourdomain.com

# Check issuer (should be Let's Encrypt)
kubectl get secret contour-tls -n projectcontour \\
    -o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
    openssl x509 -noout -issuer

# Check expiration (90 days for Let's Encrypt)
kubectl get secret contour-tls -n projectcontour \\
    -o jsonpath='{.data.tls\\.crt}' | base64 -d | \\
    openssl x509 -noout -enddate

Important: cert-manager and crtd will now run in parallel. The old crtd certificates will continue to work with edge-proxy, while new cert-manager certificates work with Contour. In Phase 3, we’ll remove crtd.

Certificate Renewal

cert-manager automatically renews certificates ~30 days before expiration:

# Check certificate renewal status
kubectl describe certificate contour-tls -n projectcontour | grep -A10 "Status"

# Monitor cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager --tail=50 -f

Step 4: Configure Gateway and Create Proxy Resources

Create Gateway

Create a Gateway with three listeners. The Gateway Provisioner provisions an Envoy deployment and AWS load balancer that expose exactly the ports defined here — no manual service patching required.

  • Port 8080 (http): plain HTTP, redirected to HTTPS by Envoy
  • Port 8443 (https, HTTPS protocol): TLS terminated by Envoy; HTTPRoute resources attach here for HTTP interface routing
  • Port 9440 (clickhouse-native, TLS protocol + Terminate mode): TLS terminated by Envoy; TLSRoute resources attach here for native protocol routing

Both the HTTPS and TLS listeners share the same wildcard TLS certificate from Step 3. The same FQDN (e.g., test-byoc-0-0.clickhouse.yourdomain.com) can appear on both listeners because they are on different ports — this is what allows a single hostname to serve HTTP on port 8443 and native protocol on port 9440, exactly as edge-proxy does.

# Create the Gateway — references the TLS cert issued directly into projectcontour namespace in Step 3
cat > /tmp/contour-gateway.yaml <<EOFGW
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: contour
  namespace: projectcontour
spec:
  gatewayClassName: contour
  listeners:
  - name: http
    protocol: HTTP
    port: 8080
    allowedRoutes:
      namespaces:
        from: All
  - name: https
    protocol: HTTPS
    port: 8443
    tls:
      mode: Terminate
      certificateRefs:
      - name: ${TLS_SECRET_NAME}
    allowedRoutes:
      namespaces:
        from: All
  - name: clickhouse-native
    protocol: TLS
    port: 9440
    tls:
      mode: Terminate
      certificateRefs:
      - name: ${TLS_SECRET_NAME}
    allowedRoutes:
      namespaces:
        from: All
EOFGW

kubectl apply -f /tmp/contour-gateway.yaml

# Wait for Gateway to be accepted and programmed by the provisioner
kubectl wait --for=condition=Accepted gateway/contour -n projectcontour --timeout=300s
kubectl wait --for=condition=Programmed gateway/contour -n projectcontour --timeout=300s

# Wait for the provisioned Envoy LoadBalancer (takes 2-3 minutes)
echo "Waiting for LoadBalancer..."
sleep 120

export CONTOUR_LB=$(kubectl get gateway contour -n projectcontour \\
    -o jsonpath='{.status.addresses[0].value}')

echo "Contour LoadBalancer: $CONTOUR_LB"
echo "export CONTOUR_LB='$CONTOUR_LB'" >> ~/.clickhouse-disconnect-env

Create HTTPRoute for ClickHouse HTTP Interface

For the HTTP interface (backend port 8123), edge-proxy uses tls-to-tcp — it terminates TLS and forwards plain HTTP. We replicate this with HTTPRoute resources attached to the HTTPS listener (port 8443). Envoy terminates TLS and routes by hostname to the correct ClickHouse service.

# Create HTTPRoute for each ClickHouse service
cat > /tmp/create-ch-httproutes.sh <<'EOFSCRIPT'
#!/bin/bash

# ClickHouse operator creates two kinds of services with edge-proxy annotations:
#   chi-<name>-<cluster>-<shard>-<replica>  — per-replica, has shard/replica labels
#   clickhouse-<name>                        — cluster-level load balancer, no shard/replica labels
# Per-replica services get individual HTTPRoute resources.
# The cluster-level service backs the cluster-wide HTTPRoute at $CUSTOM_DOMAIN.

CLUSTER_SVC=""
CLUSTER_HTTP_PORT=""

while IFS= read -r service; do
    SVC_JSON=$(kubectl get svc "$service" -n "$CH_NAMESPACE" -o json 2>/dev/null)
    if [ -z "$SVC_JSON" ]; then
        echo "Warning: service $service not found in $CH_NAMESPACE, skipping"
        continue
    fi

    PORT_MAPPING=$(echo "$SVC_JSON" | jq -r '.metadata.annotations["edge-proxy.altinity.com/port-mapping"] // empty')
    HTTP_PORT=$(echo "$PORT_MAPPING" | grep -oP '\\d+:tls-to-tcp:\\K\\d+' | head -1)

    CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
    SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
    REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')

    # Services without shard/replica labels are cluster-level — defer to cluster-wide HTTPRoute
    if [ -z "$SHARD" ] || [ -z "$REPLICA" ]; then
        echo "Deferring $service (cluster-level service, used for cluster-wide HTTPRoute)"
        CLUSTER_SVC="$service"
        CLUSTER_HTTP_PORT="$HTTP_PORT"
        continue
    fi

    if [ -z "$HTTP_PORT" ]; then
        echo "Skipping $service (no tls-to-tcp port mapping)"
        continue
    fi

    SUBDOMAIN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"
    echo "Creating HTTPRoute: $service → $SUBDOMAIN:$HTTP_PORT"

    cat <<EOFROUTE | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: ${service}-http
  namespace: ${CH_NAMESPACE}
spec:
  parentRefs:
  - name: contour
    namespace: projectcontour
    sectionName: https
  hostnames:
  - "${SUBDOMAIN}"
  rules:
  - timeouts:
      request: 3600s
      backendRequest: 3600s
    backendRefs:
    - name: ${service}
      port: ${HTTP_PORT}
EOFROUTE
    echo "  ✓ Created"

done < /tmp/clickhouse-services.txt

# Cluster-wide HTTPRoute at $CUSTOM_DOMAIN, backed by the cluster-level service
if [ -n "$CLUSTER_SVC" ] && [ -n "$CLUSTER_HTTP_PORT" ]; then
    echo "Creating cluster-wide HTTPRoute: $CUSTOM_DOMAIN → $CLUSTER_SVC:$CLUSTER_HTTP_PORT"
    cat <<EOFCLUSTER | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: clickhouse-cluster
  namespace: ${CH_NAMESPACE}
spec:
  parentRefs:
  - name: contour
    namespace: projectcontour
    sectionName: https
  hostnames:
  - "${CUSTOM_DOMAIN}"
  rules:
  - timeouts:
      request: 3600s
      backendRequest: 3600s
    backendRefs:
    - name: ${CLUSTER_SVC}
      port: ${CLUSTER_HTTP_PORT}
EOFCLUSTER
    echo "  ✓ Created cluster-wide HTTPRoute"
else
    echo "No cluster-level service found; skipping cluster-wide HTTPRoute"
fi

echo "✓ All HTTPRoute resources created"
EOFSCRIPT

chmod +x /tmp/create-ch-httproutes.sh
bash /tmp/create-ch-httproutes.sh

# Verify
kubectl get httproute -n $CH_NAMESPACE

Create TLSRoute for ClickHouse Native Protocol

For the native protocol (backend port 9000), edge-proxy uses tls-to-tcp — it terminates TLS and forwards plain TCP. We replicate this with TLSRoute resources attached to the clickhouse-native TLS listener (port 9440). Envoy terminates TLS, reads the SNI hostname to select the correct replica, and forwards plain TCP to ClickHouse port 9000.

Each TLSRoute uses the same FQDN as the corresponding HTTPRoute (e.g., test-byoc-0-0.clickhouse.yourdomain.com) — this works because they attach to different listeners (port 8443 vs port 9440). Clients connecting to hostname:8443 get HTTP; clients connecting to hostname:9440 get native protocol. No FQDN change, no client reconfiguration needed.

cat > /tmp/create-ch-tlsroutes.sh <<'EOFSCRIPT'
#!/bin/bash

CLUSTER_SVC=""
CLUSTER_NATIVE_PORT=""

while IFS= read -r service; do
    SVC_JSON=$(kubectl get svc "$service" -n "$CH_NAMESPACE" -o json 2>/dev/null)
    if [ -z "$SVC_JSON" ]; then
        echo "Warning: service $service not found in $CH_NAMESPACE, skipping"
        continue
    fi

    PORT_MAPPING=$(echo "$SVC_JSON" | jq -r '.metadata.annotations["edge-proxy.altinity.com/port-mapping"] // empty')
    # HTTP port: first tls-to-tcp entry (e.g. 8443:tls-to-tcp:8123 → 8123)
    HTTP_PORT=$(echo "$PORT_MAPPING" | grep -oP '\\d+:tls-to-tcp:\\K\\d+' | head -1)
    # Native port: last tls-to-tcp entry (e.g. 9440:tls-to-tcp:9000 → 9000)
    NATIVE_PORT=$(echo "$PORT_MAPPING" | grep -oP '\\d+:tls-to-tcp:\\K\\d+' | tail -1)

    CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
    SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
    REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')

    if [ -z "$SHARD" ] || [ -z "$REPLICA" ]; then
        echo "Deferring $service (cluster-level service, used for cluster-wide TLSRoute)"
        CLUSTER_SVC="$service"
        CLUSTER_NATIVE_PORT="$NATIVE_PORT"
        continue
    fi

    # Only create TLSRoute if there are two distinct tls-to-tcp entries (HTTP and native)
    if [ -z "$NATIVE_PORT" ] || [ "$HTTP_PORT" = "$NATIVE_PORT" ]; then
        echo "Skipping $service (no separate native protocol port mapping)"
        continue
    fi

    FQDN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"
    echo "Creating TLSRoute: $service → $FQDN (native port $NATIVE_PORT via port 9440 listener)"

    cat <<EOFTLS | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: ${service}-native
  namespace: ${CH_NAMESPACE}
spec:
  parentRefs:
  - name: contour
    namespace: projectcontour
    sectionName: clickhouse-native
  hostnames:
  - "${FQDN}"
  rules:
  - backendRefs:
    - name: ${service}
      port: ${NATIVE_PORT}
EOFTLS
    echo "  ✓ Created"

done < /tmp/clickhouse-services.txt

# Cluster-wide TLSRoute at $CUSTOM_DOMAIN, backed by the cluster-level service
if [ -n "$CLUSTER_SVC" ] && [ -n "$CLUSTER_NATIVE_PORT" ]; then
    CLUSTER_HTTP_PORT=$(kubectl get svc "$CLUSTER_SVC" -n "$CH_NAMESPACE" -o json | \\
        jq -r '.metadata.annotations["edge-proxy.altinity.com/port-mapping"] // empty' | \\
        grep -oP '\\d+:tls-to-tcp:\\K\\d+' | head -1)
    if [ "$CLUSTER_HTTP_PORT" != "$CLUSTER_NATIVE_PORT" ]; then
        echo "Creating cluster-wide TLSRoute: $CUSTOM_DOMAIN → $CLUSTER_SVC:$CLUSTER_NATIVE_PORT"
        cat <<EOFCLUSTER | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: clickhouse-cluster-native
  namespace: ${CH_NAMESPACE}
spec:
  parentRefs:
  - name: contour
    namespace: projectcontour
    sectionName: clickhouse-native
  hostnames:
  - "${CUSTOM_DOMAIN}"
  rules:
  - backendRefs:
    - name: ${CLUSTER_SVC}
      port: ${CLUSTER_NATIVE_PORT}
EOFCLUSTER
        echo "  ✓ Created cluster-wide TLSRoute"
    else
        echo "No separate native protocol port for cluster-level service; skipping cluster-wide TLSRoute"
    fi
fi

echo "✓ All TLSRoute resources created"
EOFSCRIPT

chmod +x /tmp/create-ch-tlsroutes.sh
bash /tmp/create-ch-tlsroutes.sh

# Verify
kubectl get httproute,tlsroute -n $CH_NAMESPACE

Create HTTPRoute for Monitoring Services (Prometheus, Grafana)

This is an optional step to expose monitoring services

Edge-proxy routes monitoring traffic as tcp-passthrough on specific ports (e.g. 9090:tcp-passthrough:9090). Because monitoring services use plain TCP with no TLS, there is no SNI for Contour to route on. Instead, expose monitoring services as named HTTPS subdomains: Contour terminates TLS and forwards plain HTTP to the backend.

This changes the access URL from CUSTOM_DOMAIN:9090 to https://prometheus.CUSTOM_DOMAIN (or similar). Update any dashboards or alert configurations accordingly.

if [ -s /tmp/monitoring-services.txt ]; then
    cat > /tmp/create-monitoring-routes.sh <<'EOFSCRIPT'
#!/bin/bash

while IFS= read -r service; do
    SVC_JSON=$(kubectl get svc "$service" -n "$SYS_NAMESPACE" -o json 2>/dev/null)
    if [ -z "$SVC_JSON" ]; then
        echo "Warning: service $service not found in $SYS_NAMESPACE, skipping"
        continue
    fi

    PORT_MAPPING=$(echo "$SVC_JSON" | jq -r '.metadata.annotations["edge-proxy.altinity.com/port-mapping"] // empty')
    INTERNAL_PORT=$(echo "$PORT_MAPPING" | grep -oP ':\\K\\d+$')
    APP=$(echo "$SVC_JSON" | jq -r '.metadata.labels.app // empty')

    if [ -z "$INTERNAL_PORT" ] || [ -z "$APP" ]; then
        echo "Skipping $service (could not determine port or app label)"
        continue
    fi

    FQDN="${APP}.${CUSTOM_DOMAIN}"
    echo "Creating HTTPRoute: $service → <https://$FQDN> (port $INTERNAL_PORT)"

    cat <<EOFROUTE | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: ${service}-monitoring
  namespace: ${SYS_NAMESPACE}
spec:
  parentRefs:
  - name: contour
    namespace: projectcontour
    sectionName: https
  hostnames:
  - "${FQDN}"
  rules:
  - backendRefs:
    - name: ${service}
      port: ${INTERNAL_PORT}
EOFROUTE
    echo "  ✓ Created — access at <https://$FQDN>"

done < /tmp/monitoring-services.txt

echo "✓ Monitoring HTTPRoute resources created"
EOFSCRIPT

    chmod +x /tmp/create-monitoring-routes.sh
    bash /tmp/create-monitoring-routes.sh
else
    echo "No monitoring services found, skipping..."
fi

Handle tls-passthrough Services

If Step 1 found services using tls-passthrough, those services handle TLS termination themselves (e.g., ClickHouse with TLS enabled on the native port). Contour needs a separate TLS/Passthrough listener for each external port used with tls-passthrough — this is a different listener mode from the TLS/Terminate listener used for tls-to-tcp services.

If tls-passthrough services use the same external port as tls-to-tcp services (both on port 9440), the Gateway cannot serve both modes on a single listener — move the passthrough service to a different external port or contact Altinity for guidance.

if [ -s /tmp/tls-passthrough-services.txt ]; then
    # Find the unique external ports used by tls-passthrough services
    PASSTHROUGH_PORTS=$(cat /tmp/tls-passthrough-services.txt | \\
        grep -oP '\\d+:tls-passthrough' | cut -d: -f1 | sort -u)

    echo "Adding TLS/Passthrough listeners for ports: $PASSTHROUGH_PORTS"

    for PORT in $PASSTHROUGH_PORTS; do
        # Check for conflict with existing listeners (port 8443 and 9440 are already defined)
        if kubectl get gateway contour -n projectcontour -o json | \\
                jq -e ".spec.listeners[] | select(.port == ${PORT})" > /dev/null 2>&1; then
            echo "ERROR: Port $PORT already has a listener — tls-passthrough and tls-to-tcp cannot share a port."
            echo "       Move one to a different external port before proceeding."
            continue
        fi

        echo "Adding Passthrough listener on port $PORT"
        kubectl patch gateway contour -n projectcontour --type='json' -p="[{
          \\"op\\": \\"add\\",
          \\"path\\": \\"/spec/listeners/-\\",
          \\"value\\": {
            \\"name\\": \\"tls-passthrough-${PORT}\\",
            \\"protocol\\": \\"TLS\\",
            \\"port\\": ${PORT},
            \\"tls\\": {\\"mode\\": \\"Passthrough\\"},
            \\"allowedRoutes\\": {\\"namespaces\\": {\\"from\\": \\"All\\"}}
          }
        }]"
    done

    # Create TLSRoute for each tls-passthrough service
    cat > /tmp/create-ch-passthrough-tlsroutes.sh <<'EOFSCRIPT'
#!/bin/bash

while IFS= read -r svc_line; do
    # Parse: "namespace/name: mapping"
    NS=$(echo "$svc_line" | cut -d/ -f1)
    REST=$(echo "$svc_line" | cut -d/ -f2-)
    SVC=$(echo "$REST" | cut -d: -f1)
    MAPPING=$(echo "$REST" | cut -d' ' -f2-)

    SVC_JSON=$(kubectl get svc "$SVC" -n "$NS" -o json 2>/dev/null)
    if [ -z "$SVC_JSON" ]; then
        echo "Warning: service $SVC not found in $NS, skipping"
        continue
    fi

    CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
    SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
    REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')

    if [ -z "$SHARD" ] || [ -z "$REPLICA" ]; then
        echo "Skipping $SVC (cluster-level service; passthrough routing is per-replica only)"
        continue
    fi

    FQDN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"

    # Each passthrough mapping: "extport:tls-passthrough:backendport"
    echo "$MAPPING" | tr ',' '\\n' | grep 'tls-passthrough' | while IFS=: read -r EXT_PORT MODE BACKEND_PORT; do
        echo "Creating TLSRoute (passthrough): $SVC → $FQDN ext=$EXT_PORT backend=$BACKEND_PORT"

        cat <<EOFTLS | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: ${SVC}-passthrough-${EXT_PORT}
  namespace: ${NS}
spec:
  parentRefs:
  - name: contour
    namespace: projectcontour
    sectionName: tls-passthrough-${EXT_PORT}
  hostnames:
  - "${FQDN}"
  rules:
  - backendRefs:
    - name: ${SVC}
      port: ${BACKEND_PORT}
EOFTLS
        echo "  ✓ Created"
    done

done < /tmp/tls-passthrough-services.txt

echo "✓ Passthrough TLSRoute resources created"
EOFSCRIPT

    chmod +x /tmp/create-ch-passthrough-tlsroutes.sh
    bash /tmp/create-ch-passthrough-tlsroutes.sh

    kubectl get tlsroute -A | grep passthrough
else
    echo "No tls-passthrough services found, skipping."
fi

Handle IP Whitelisting

The edge-proxy.altinity.com/whitelist annotation restricts which source IPs can connect. In Contour Gateway API mode, per-route IP filtering is not available as a standard feature. The recommended approach is to enforce IP restrictions at the AWS NLB security group — this blocks disallowed traffic before it reaches Envoy and is the most reliable option.

if [ -s /tmp/whitelist-services.txt ]; then
    echo "=== Whitelist CIDRs found ==="
    # Collect all unique CIDRs across all whitelisted services
    ALL_CIDRS=$(cat /tmp/whitelist-services.txt | \\
        grep -oP '[\\d./]+' | sort -u | tr '\\n' ',')
    echo "CIDRs to allow: $ALL_CIDRS"

    # Get the security group attached to Contour's Envoy NLB
    CONTOUR_SG=$(aws ec2 describe-security-groups \\
        --filters "Name=tag:kubernetes.io/service-name,Values=projectcontour/envoy" \\
        --query 'SecurityGroups[0].GroupId' --output text 2>/dev/null)

    if [ -n "$CONTOUR_SG" ] && [ "$CONTOUR_SG" != "None" ]; then
        echo "Contour NLB security group: $CONTOUR_SG"
        echo ""
        echo "Add inbound rules to $CONTOUR_SG for the following CIDRs on ports 8443 and 9440:"
        echo "$ALL_CIDRS" | tr ',' '\\n' | grep -v '^$' | while read -r CIDR; do
            echo "  aws ec2 authorize-security-group-ingress --group-id $CONTOUR_SG \\\\"
            echo "      --protocol tcp --port 8443 --cidr $CIDR"
            echo "  aws ec2 authorize-security-group-ingress --group-id $CONTOUR_SG \\\\"
            echo "      --protocol tcp --port 9440 --cidr $CIDR"
        done
        echo ""
        echo "Review per-service whitelists in /tmp/whitelist-services.txt."
        echo "If different services have different CIDRs, use separate NLB listeners or"
        echo "implement a Kubernetes NetworkPolicy on the ClickHouse pods for finer-grained control."
    else
        echo "Could not auto-detect Contour NLB security group. Find it manually:"
        echo "  AWS Console → EC2 → Load Balancers → find the Contour NLB → Security tab"
        echo "Then add inbound rules for the CIDRs listed above."
    fi
fi

Handle Upstream TLS (tls-to-tls / tls-to-tls-insecure)

If Step 1 found services using tls-to-tls or tls-to-tls-insecure, edge-proxy was terminating the client TLS and then opening a new TLS connection to the upstream (ClickHouse with TLS on the pod). In Contour Gateway API, upstream TLS is configured via the experimental BackendTLSPolicy resource.

if [ -s /tmp/tls-to-tls-services.txt ]; then
    echo "Services requiring upstream TLS:"
    cat /tmp/tls-to-tls-services.txt

    echo ""
    echo "For each service, create a BackendTLSPolicy. Example for service chi-NAME-0-0:"
    cat <<'EOFEXAMPLE'
apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTLSPolicy
metadata:
  name: chi-NAME-0-0-upstream-tls
  namespace: $CH_NAMESPACE
spec:
  targetRefs:
  - group: ""
    kind: Service
    name: chi-NAME-0-0       # must match the service name in the HTTPRoute backendRef
  validation:
    # For tls-to-tls: specify the CA cert that signed the upstream cert
    # Create a ConfigMap with the CA cert first, then reference it here:
    caCertificateRefs:
    - group: ""
      kind: ConfigMap
      name: clickhouse-upstream-ca
    hostname: chi-NAME-0-0.internal.clickhouse.svc   # upstream TLS server name

    # For tls-to-tls-insecure: use wellKnownCACertificates instead and set
    # hostname to match the upstream cert's CN/SAN. Or for fully insecure,
    # this feature is not yet supported in Gateway API — use Contour HTTPProxy
    # with spec.routes[].services[].protocol: tls instead.
EOFEXAMPLE

    echo ""
    echo "See: <https://gateway-api.sigs.k8s.io/api-types/backendtlspolicy/>"
    echo "Note: BackendTLSPolicy requires Gateway API experimental channel (already installed in Step 2)."
fi

Handle Zone-Aware Routing

Edge-proxy supports AZ-local routing: each edge-proxy instance routes requests to services tagged with a matching edge-proxy.altinity.com/zone annotation. This prevents cross-AZ traffic between the proxy and ClickHouse pods.

With Contour, this per-AZ routing is not replicated. A single Contour LB accepts all traffic and routes to pods in any AZ based on the FQDN. Cross-AZ traffic between Envoy and ClickHouse pods may increase, which can raise AWS data transfer costs.

if [ -s /tmp/zone-services.txt ]; then
    echo "Zone-annotated services:"
    cat /tmp/zone-services.txt
    echo ""
    echo "Mitigations:"
    echo ""
    echo "1. Enable topology-aware routing on ClickHouse services (Kubernetes 1.27+)."
    echo "   This tells kube-proxy to prefer endpoints in the same AZ as the Envoy pod:"
    while IFS= read -r line; do
        SVC=$(echo "$line" | awk '{print $1}' | tr -d '(zone:')
        NS=$(echo "$SVC" | cut -d/ -f1)
        NAME=$(echo "$SVC" | cut -d/ -f2)
        [ -n "$NAME" ] && echo "   kubectl annotate svc $NAME -n $NS service.kubernetes.io/topology-mode=Auto"
    done < /tmp/zone-services.txt
    echo ""
    echo "2. Use per-replica FQDNs (e.g. test-byoc-0-0.custom.domain) instead of the"
    echo "   cluster-level FQDN to ensure clients connect to a specific replica."
    echo "   Per-replica routing is already configured in the HTTPRoute/TLSRoute scripts above."
fi

Step 5: Test Contour (Before DNS Change)

HTTP interface tests use --connect-to to direct traffic to the Contour LB while using the real service FQDN in the URL. This sets both the TLS SNI (required for the Gateway to match the cert) and the HTTP Host header (required for HTTPRoute matching). --insecure skips cert hostname validation since the cert is for *.${CUSTOM_DOMAIN}, not the LB hostname. (--resolve looks similar but requires an IP; --connect-to accepts ELB DNS names directly.)

Native protocol (port 9440) cannot be tested pre-DNS: clickhouse-client uses --host as both the TCP destination and the TLS SNI with no way to override them independently. Native protocol is verified in Step 7 immediately after the 60-second DNS cutover.

# Pick the first per-replica ClickHouse service (cluster-level services have no shard/replica labels)
SUBDOMAIN=""
while IFS= read -r service; do
    SVC_JSON=$(kubectl get svc "$service" -n "$CH_NAMESPACE" -o json 2>/dev/null)
    CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
    SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
    REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')
    if [ -n "$SHARD" ] && [ -n "$REPLICA" ]; then
        SUBDOMAIN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"
        break
    fi
done < /tmp/clickhouse-services.txt
if [ -z "$SUBDOMAIN" ]; then echo "ERROR: no per-replica service found in /tmp/clickhouse-services.txt"; exit 1; fi

# --connect-to HOST:PORT:CONNECT-HOST:CONNECT-PORT routes TCP to $CONTOUR_LB while
# keeping $SUBDOMAIN as the URL hostname → curl sets SNI=$SUBDOMAIN and Host: $SUBDOMAIN.
echo "Test 1: ClickHouse HTTP via Contour (unauthenticated)"
curl --insecure \\
    --connect-to "${SUBDOMAIN}:8443:${CONTOUR_LB}:8443" \\
    "<https://$>{SUBDOMAIN}:8443/?query=SELECT%20version()"

echo "Test 2: HTTP with authentication"
curl --insecure \\
    --connect-to "${SUBDOMAIN}:8443:${CONTOUR_LB}:8443" \\
    "<https://$>{SUBDOMAIN}:8443/?query=SELECT%20hostName()" \\
    --user $CH_USER:$CH_PASSWORD

# Note: native protocol (port 9440) cannot be reliably tested before DNS cutover.
# clickhouse-client uses --host as both the TCP destination and the TLS SNI — there is
# no separate SNI override. Connecting by IP sends no SNI, and /etc/hosts is not
# consistently honoured by the client. Since TTL is already 60 s, verify native
# protocol immediately after the DNS cutover in Step 6 instead.
echo "Skipping native protocol pre-DNS test — will verify in Step 7 after DNS cutover."

# Test 4: Monitoring services (if exist)
if [ -s /tmp/monitoring-services.txt ]; then
    echo "Test 4: Prometheus"
    curl --insecure \\
        --connect-to "prometheus.${CUSTOM_DOMAIN}:8443:${CONTOUR_LB}:8443" \\
        "<https://prometheus.$>{CUSTOM_DOMAIN}/-/healthy" || true
fi

echo "✓ Pre-DNS tests complete"

Step 6: Switch Traffic to Contour (LB Cutover)

DNS currently points to edge-proxy’s LB ($EDGE_PROXY_LB) with TTL=60, set during the pre-flight step. Update the CNAMEs to $CONTOUR_LB to cut live traffic over to Contour. Because TTL is already 60 seconds, propagation takes approximately one minute with no extended downtime window.

# Confirm current DNS is pointing at edge-proxy's LB and TTL is low
dig $CUSTOM_DOMAIN

# Switch all custom domain CNAMEs from edge-proxy's LB to Contour's LB
for SUBDOMAIN in "${CUSTOM_DOMAIN}" "internal.${CUSTOM_DOMAIN}" "vpce.${CUSTOM_DOMAIN}"; do
    aws route53 change-resource-record-sets \\
        --hosted-zone-id $HOSTED_ZONE_ID \\
        --change-batch "{
          \\"Changes\\": [{
            \\"Action\\": \\"UPSERT\\",
            \\"ResourceRecordSet\\": {
              \\"Name\\": \\"*.${SUBDOMAIN}\\",
              \\"Type\\": \\"CNAME\\",
              \\"TTL\\": 300,
              \\"ResourceRecords\\": [{\\"Value\\": \\"${CONTOUR_LB}\\"}]
            }
          }]
        }" && echo "Switched *.${SUBDOMAIN}$CONTOUR_LB" || echo "Skipped *.${SUBDOMAIN} (not in use)"
done

echo "Waiting 60s for DNS propagation..."
sleep 60

# Verify all zones now resolve to Contour's LB
dig "*.${CUSTOM_DOMAIN}" +short
dig "*.internal.${CUSTOM_DOMAIN}" +short
dig "*.vpce.${CUSTOM_DOMAIN}" +short

Step 7: Test Production Endpoints

# Test each ClickHouse service by its individual FQDN (routes exist per-service, not on $CUSTOM_DOMAIN)
while IFS= read -r service; do
    SVC_JSON=$(kubectl get svc "$service" -n "$CH_NAMESPACE" -o json 2>/dev/null)
    if [ -z "$SVC_JSON" ]; then continue; fi

    CHI=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/chi"] // empty')
    SHARD=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/shard"] // empty')
    REPLICA=$(echo "$SVC_JSON" | jq -r '.metadata.labels["clickhouse.altinity.com/replica"] // empty')

    # Skip cluster-level services (no shard/replica labels) — they have no individual route
    if [ -z "$SHARD" ] || [ -z "$REPLICA" ]; then
        echo "Skipping $service (cluster-level, no per-replica route)"
        continue
    fi

    SUBDOMAIN="${CHI}-${SHARD}-${REPLICA}.${CUSTOM_DOMAIN}"

    echo "Test HTTP: $SUBDOMAIN"
    curl "<https://$>{SUBDOMAIN}:8443/?query=SELECT%20hostName()" \\
        --user $CH_USER:$CH_PASSWORD

    echo "Test native: $SUBDOMAIN port 9440"
    clickhouse-client --host="$SUBDOMAIN" --port=9440 --secure \\
        --user="$CH_USER" --password="$CH_PASSWORD" \\
        --query "SELECT version(), hostName()"
done < /tmp/clickhouse-services.txt

# Test monitoring (now served as HTTPS subdomains)
if [ -s /tmp/monitoring-services.txt ]; then
    echo "Test: Prometheus"
    curl "<https://prometheus.$>{CUSTOM_DOMAIN}/-/healthy"

    echo "Test: Grafana"
    curl "<https://grafana.$>{CUSTOM_DOMAIN}/api/health"
fi

echo "✓ All production endpoints working"

Step 8: Monitor for 48 Hours

# Application health
kubectl logs -f deployment/your-app -n your-app-namespace | grep -i clickhouse

# Contour health
kubectl get pods -n projectcontour
kubectl get gateway -n projectcontour
kubectl get httproute,tlsroute -n $CH_NAMESPACE

# ClickHouse health
clickhouse-client --host $CUSTOM_DOMAIN --port 9440 --secure \\
    --user $CH_USER --password $CH_PASSWORD \\
    --query "SELECT event_time, user, exception
             FROM system.query_log
             WHERE type = 'ExceptionWhileProcessing'
               AND event_time > now() - INTERVAL 1 HOUR
             ORDER BY event_time DESC LIMIT 20"

# Check query performance
clickhouse-client --host $CUSTOM_DOMAIN --port 9440 --secure \\
    --user $CH_USER --password $CH_PASSWORD \\
    --query "SELECT
                quantile(0.5)(query_duration_ms) as p50_ms,
                quantile(0.95)(query_duration_ms) as p95_ms,
                quantile(0.99)(query_duration_ms) as p99_ms
             FROM system.query_log
             WHERE event_time > now() - INTERVAL 1 HOUR
               AND type = 'QueryFinish'"

Rollback Procedure

Roll back by pointing CNAMEs back to edge-proxy’s LB. edge-proxy remains running until the last step so we can seamlessly switch traffic over.

for SUBDOMAIN in "${CUSTOM_DOMAIN}" "internal.${CUSTOM_DOMAIN}" "vpce.${CUSTOM_DOMAIN}"; do
    aws route53 change-resource-record-sets \\
        --hosted-zone-id $HOSTED_ZONE_ID \\
        --change-batch "{
          \\"Changes\\": [{
            \\"Action\\": \\"UPSERT\\",
            \\"ResourceRecordSet\\": {
              \\"Name\\": \\"*.${SUBDOMAIN}\\",
              \\"Type\\": \\"CNAME\\",
              \\"TTL\\": 60,
              \\"ResourceRecords\\": [{\\"Value\\": \\"${EDGE_PROXY_LB}\\"}]
            }
          }]
        }" && echo "Rolled back *.${SUBDOMAIN}$EDGE_PROXY_LB"
done

echo "✓ DNS rolled back to edge-proxy"

Troubleshooting

Native Protocol Not Working

# Check TLSRoute status
kubectl describe tlsroute -n $CH_NAMESPACE

# Verify Gateway has the clickhouse-native listener programmed
kubectl describe gateway contour -n projectcontour | grep -A5 "clickhouse-native"

# Verify Envoy service exposes port 9440
kubectl get svc -n projectcontour -o yaml | grep "port: 9440"

# Check Envoy logs
kubectl logs -n projectcontour daemonset/envoy | grep -i "9440\\|tlsroute\\|error"

HTTPRoute or TLSRoute Shows Invalid

kubectl describe httproute -n $CH_NAMESPACE
kubectl describe tlsroute -n $CH_NAMESPACE

# Common issues:
# - Gateway sectionName mismatch (must match listener name: 'https' or 'clickhouse-native')
# - Service not found in the same namespace as the route
# - Port mismatch between route backendRef and service port

Monitoring Services Not Accessible

# Check HTTPRoute status for monitoring
kubectl describe httproute -n $SYS_NAMESPACE

# Test from within cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \\
    curl http://prometheus-service.$SYS_NAMESPACE.svc.cluster.local:9090/-/healthy

Verification Checklist

  • □ All services with edge-proxy annotations discovered
  • □ Gateway API CRDs and Contour Gateway Provisioner installed; GatewayClass created
  • □ cert-manager installed; ClusterIssuer ready
  • □ Custom domain CNAMEs pointed at $EDGE_PROXY_LB with TTL=60
  • _acme-challenge CNAME delegations to Altinity’s DNS removed
  • □ TLS certificate issued by Let’s Encrypt in projectcontour namespace; SANs correct
  • □ Gateway created with https (port 8443) and clickhouse-native (port 9440) listeners; $CONTOUR_LB saved
  • □ HTTPRoute created for each ClickHouse service (HTTP interface, port 8443)
  • □ TLSRoute created for each ClickHouse service (native protocol, port 9440, same FQDN as HTTPRoute)
  • □ HTTPRoute configured for monitoring services (if applicable)
  • (if applicable) tls-passthrough services: Passthrough listener(s) added to Gateway and TLSRoutes created
  • (if applicable) IP whitelisting: AWS NLB security group rules applied for all whitelist CIDRs
  • (if applicable) Upstream TLS (tls-to-tls): BackendTLSPolicy configured for each affected service
  • (if applicable) Zone-aware routing: topology-mode=Auto annotation applied to per-replica services (or limitation documented)
  • □ HTTP and native protocol tests passing via $CONTOUR_LB (pre-cutover)
  • □ CNAMEs switched to $CONTOUR_LB (Step 6 LB cutover complete)
  • □ Production endpoints working via custom domain
  • □ Individual service subdomains working (HTTP on port 8443, native on port 9440)
  • □ Monitoring services accessible
  • □ No application errors
  • □ Query performance acceptable
  • □ Stable for 48+ hours

Contour Deployment Complete? Monitor your deployment for at least 48 hours, then feel free to Remove Altinity Components from your cluster.

Issues? Use rollback procedure or contact Altinity support.