Overview
- Open source, unified programming model to define and execute data processing pipelines
- Unified—single model for batch and stream
- B(atch)(Str)eam
- Avoid Training-Serving Skew
- Portable—works in different execution environments, e.g. Dataflow, Apache Spark
- Extensible—write/share connectors and transformation libraries
- Templates
- Java, Python, Go
- SDK available
- Model representation of code—executed via runners in execution environment
Use-cases
- Pipe data in data warehouse
- ETL