Overview

  • Run Kubernetes clusters on-prem on dedicated hardware provided and maintained by Google
    • Google engineers require physical access
  • Deploy workloads like in GKE
  • Provision new clusters in Cloud Console, CLI and API—similar to GKE
  • Clusters can access VPC via Cloud VPN

Use Cases

  • Stable network connection required between Kubernetes workloads and on-prem
  • Low latency to on-prem required
  • Large data volumes—cost/performance prohibitive on Google Cloud
  • Regulatory/data sovereignty reasons

Limitations

  • Limited processing capacity
  • Workload restrictions
  • Anthos Service Mesh and Anthos Config Management not supported

Architecture

  • Rack of hardware—Distributed Cloud Edge Zone (GDC Edge Zone)
  • Kubernetes control plane runs in Google Cloud region
    • GDC Edge Zone required constant Internet communication
    • Remotely managed—e.g. software updates, resolve config issues
  • GDC Edge Zone contains Nodes, grouped into Node Pools

Hardware

gdce_hardware

  • Customer/colo manages local network and edge routers
  • Storage: 4 TiB per physical machine
    • Ephemeral data only
    • Presented to cluster as PersistentVolumes

GPU Optimized Configuration

  • Option
  • 12 x Nvidia T4s—300 camera feeds simultaneously
  • AI/ML workloads

Kubernetes

gdce_k8s

  • Kubernetes control plane resides in Google Cloud
    • Doesn’t use compute capacity of GDC Edge Zone
  • KubeVirt VMs

Pricing

  • ~ÂŁ12,000/month per cluster
  • Each cluster contains:
    • 6x Server-C
      • 256 GB RAM
      • 4 TiB storage

Availability

  • No SLA—only SLO
  • SLO proportional to spare compute capacity
    • Workloads automatically moved to spare capacity in case of node failure
    • Implement with node taints or reserved node capacity constraints
Capacity in useReserved capacitySLO
66.67%33.33%99.99%
83.99 %16.67%99.9%
100%0%93.5%
  • Upon hardware failure—Google engineer schedules site visit within three working days

Maintenance Windows

  • Updates mandatory—no opt-out
  • Can specify window for each cluster
  • Best practice—stagger windows for clusters
    • Ensures HA of critical workloads

Graph View