🧠 mattdunn.info

Search

Apache Beam

Last updated 23 Oct 2023

data
big-data
data-processing

Overview

Open source, unified programming model to define and execute data processing pipelines
Unified—single model for batch and stream
- B(atch)(Str)eam
- Avoid Training-Serving Skew
Portable—works in different execution environments, e.g. Dataflow, Apache Spark
Extensible—write/share connectors and transformation libraries
Templates
Java, Python, Go
SDK available
Model representation of code—executed via runners in execution environment

Use-cases

Pipe data in data warehouse
ETL

Graph View

Backlinks

Dataflow
Google Professional Machine Learning Engineer Certification

Copyright © Matt Dunn 2024

GitHub
LinkedIn
Quartz v4.2.3