Introduction

  • Best practices for GKE multi-tenancy with enterprise organisations
  • Assume teams deploy workloads through Kubernetes API without platform team’s input
  • Definitions of a tenant:
    • Team responsible for 1+ workloads
    • Set of related workloads
    • Single workload

Networking

  • Shared VPC for each cluster/environment
    • In Cluster Networking folder
    • Managed by central networking team
  • Tenant shared VPC per environment
    • Non-cluster resources

HA and Reliability

  • One cluster admin per project
    • Prevents misconfigurations affecting all clusters
  • Private clusters
    • Disable access to nodes and manage control plane access
  • Regional clusters—control plane and nodes
  • Utilise autoscaling
  • Schedule maintenance windows
  • Set up shared Ingress/load balancer

Security

  • Network polices
    • deny-all for cross-namespace traffic by default
  • GKE Sandbox
    • User-space kernel
    • Stops malicious tenants from affecting others
  • Policy-based admission controls
    • Prevent pods violating security policies
    • Options:
      • Gatekeeper OPA—requires GKE Enterprise
      • PodSecurity admission controller
  • Workload Identity Federation for GKE
    • Access to GCP services
    • Map Kubernetes service accounts names to virtual Google Cloud service account handles—assign IAM roles
  • Authorized Networks
    • Restrict IPs which have access to control plane

Provisioning

  • Namespace per tenant
    • Tenant admin manages users with namespace
    • Standardise namespace names—across environments to make config easier, CI/CD scripts etc.
  • Project per tenant for non-cluster resources
    • Including logs, monitoring, service accounts etc.
  • Kubernetes RBAC—fine-grained access to namespaces
    • Bind to Google Groups
  • Create tenant-specific service account for each workload
    • Security
    • Map to Kubernetes service accounts via Workload Identity Federation
  • Create resource quota per namespace—CPU and memory

Monitoring, Logging and Usage

  • GKE Cost Allocation
    • Cost breakdown by namespace and label
    • Not supported by Autopilot
  • Tenant-specific logs
    • Log Router—sink to export to log bucket in tenant projects

References


Graph View