Overview

  • Open source, unified programming model to define and execute data processing pipelines
  • Unified—single model for batch and stream
  • Portable—works in different execution environments, e.g. Dataflow, Apache Spark
  • Extensible—write/share connectors and transformation libraries
  • Templates
  • Java, Python, Go
  • SDK available
  • Model representation of code—executed via runners in execution environment

Use-cases

  • Pipe data in data warehouse
  • ETL

Graph View