Overview
- Collects and visualizes metrics from Google Cloud resources—how well are resources performing?
- Track SLOs and SLAs
- Alerting—identify issues as they happen
Metrics Scopes
- Previously known as workspaces
- Host for monitored projects—multiple projects can be monitored in a Metric Scope, but a project can only be in one Metric Scope
- Best practice: create a central Metric Scope project—single pane of glass for all projects
Monitoring Agents
- Required for monitoring of Compute Engine and AWS EC2 instances—most monitoring is baked into the Google Cloud services
- More detailed and granular metrics
- Can gather metrics from 3rd party apps, e.g. NGINX
- Monitoring agent: collectd
- Logging agent: Fluentd
Metrics
- Types: bool, int64, double, string
- Kinds:
- Gauge—instant in time e.g. CPU usage
- Delta—change in value since last recording
- Cumulative—sum over time e.g. sent bytes
- Examples: latency, number of SQL records, disk space
- 1500+ pre-created metrics
- Custom metrics, define via built in Monitoring API or OpenCensus
- Best practice: use built-in metrics if possible before creating custom metrics
Integration
- Cloud Monitoring API
- Export to BigQuery or external tools via Service Account authentication, e.g. Grafana
Uptime Checks
- HTTP GET/POST to application—requires FQDN
- Looks for successful response
- Can check for expired SSL certificates
- Alerts—email, SMS, Slack, PagerDuty, Pub/Sub