Three Categories of Workloads

AI Workloads

GKE for AI

  • Open standard API
    • Port workloads from data scientist workstation to cloud—reliably reproduce results
  • Hugely horizontally scalable
  • Autopilot—opinionated config
    • Increase productivity—don’t need to worry about infrastructure
  • TPU v4—GA
  • TPU v5e—preview

Modern Workloads

Enterprise Workloads

  • Challenges:
    • Multiple environments/teams
    • Increase risk of compliance
    • Speed

GKE Enterprise

  • RBAC across clusters
  • Vulnerability scanning
  • Policies/guardrails—GitOps

GKE Interactive Troubleshooting Playbooks

  • SRE practices
  • Opinionated path to diagnose problems

Loveholidays

  • Uptime is an antipattern
  • 200+ Compute Engine instances created per hour
  • Peaks—150x normal traffic
  • Ripley—tool to replay realistic HTTPS traffic
  • OwlBot—simulated autoscaling overnight
    • Identifies bottlenecks
  • FinOps metric—$ cost to serve 1000 users

Continuous Disaster Recovery

  • Create copies of prod—GKE fleets
  • Balance load between clusters in multiple regions—Gateway API
  • Scaling clusters—on-demand clusters

Graph View