Why Apache Beam? A Google Perspective

When we made the decision (in partnership with data Artisans, Cloudera, Talend, and a few other companies) to move the Google Cloud Dataflow SDK and runners into the Apache Beam incubator project, we did so with the following goal in mind: provide the world with an easy-to-use, but powerful model for data-parallel processing, both streaming and batch, portable across a variety of runtime platforms. Now that the dust on the initial code drops is starting to settle, we wanted to talk briefly about why this makes sense for us at Google and how we got here, given that Google hasn’t historically been directly involved in the OSS world of data-processing.

Why does this make sense for Google?

Google is a business, and as such, it should come as no surprise there’s a business motivation for us behind the Apache Beam move. That motivation hinges primarily on the desire to get as many Apache Beam pipelines as possible running on Cloud Dataflow. Given that, it may not seem intuitive to adopt a strategy of opening the platform up to other runners. However, it’s quite the contrary. Opening up the platform yields many benefits:
  • The more runners Apache Beam supports, the more attractive it becomes as a platform
  • The more users adopt Apache Beam, the more users there are that might possibly want to run Apache Beam on Google Cloud Platform
  • The more folks we get involved in developing Apache Beam, the more we can push forward the state of the art in data processing
Note that these benefits apply not only to Google, but anyone involved in Apache Beam. If a portable abstraction layer exists for building data-processing pipelines, it now becomes much easier for new runners to come along and compete on technical innovations that provide better performance, reliability, ease of operational management, etc. In other words, eliminating API lock-in makes for a freer market of execution engines, which leads to more competition, and ultimately a better end result for the industry. 

Details: https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective

Comments