🧠 mattdunn.info

Search

BigQuery

Last updated 23 Oct 2023

google-cloud
data
data-warehouse
analytics
machine-learning

Overview

Data warehouse
Storage plus analytics
SQL queries
Serverless
Multi-regional
SQL column store
Terabytes to petabytes
BigQuery ML, geospatial analysis and BI
Public datasets available
Real-time analytics
Automatic replication, 7 day storage of changes
Cloud Monitoring—e.g. number of jobs running, bytes scanned during a query, distribution of query times
Encrypted at rest
Built in machine learning
- Write ML models directly in BigQuery using SQL

Pricing

Two options:
- Pay by amount of data queries process
- BigQuery Slots
  - Up-front purchase of processing capacity
  - Flat-rate pricing
  - Useful for CapEx optimization model

Data Ingestion

BigQuery Data Ingestion

Data sources:
- Internal data
- External data
- Multi-cloud data—AWS, Azure
- Public datasets
Replicated
Backed up
Autoscaled

External Data Sources

Query data stored in other locations, e.g.
- Cloud Storage
- Cloud Spanner
- Cloud SQL
- CSV
No need to ingest into BigQuery
Note: inconsistencies could form from saving and processing data separately
- Consider using Dataflow to build streaming pipeline

Query Jobs

Interactive Query Jobs

Default
Query run as soon as possible
Count towards concurrent rate limit and daily limit

Batch Query Jobs

Queued by BigQuery
Query started as soon as idle resources available in shared resource pool
If not started within 24 hours—automatically changed to interactive
Don’t count towards concurrent rate limit

Authorized Views

Grant specific users access to subsets of data via authorized views
Source dataset contains the source data
Create separate dataset to container authorized view
- Users granted access to authorized view, but not underlying source dataset
Authorized view granted access to the source dataset

Policy Tags

Apply to columns
Define access to data when using:
- Column-level access control
- Dynamic data masking, e.g. PII, financial data, customer order history

References

AI ML Lifecycle
Google Cloud Database Connections

Graph View

Backlinks

AutoML
BigLake
BigQuery ML
Cloud Data Loss Prevention
Cloud EKM
Cloud Functions
Cloud Logging
Cloud Monitoring
Dataflow
Dataprep
EHR Healthcare Case Study
GKE Security Best Practices
Google Cloud Data Lifecycle
Google Cloud Database Connections
Google Cloud Machine Learning Best Practices
Google Looker Studio
Healthcare API
Helicopter Racing League Case Study
HipLocal Case Study
Looker
Mountkirk Games Case Study
Running AI at the Edge
TerramEarth Case Study
Vertex AI ML Workflow
Vertex AI Search for Retail
What's Next for Data and AI - October 2023
Google Cloud Machine Learning
Google Cloud
Google Professional Machine Learning Engineer Certification

Copyright © Matt Dunn 2024

GitHub
LinkedIn
Quartz v4.2.3