Posts

Showing posts with the label Apache Beam

What Open Source Software Do You Use?

To gather insights on the current and future state of open source software (OSS), we talked to 31 executives. This is nearly double the number we speak to for a research guide and believe this reiterates the popularity of, acceptance of, and demand for OSS. We began by asking, "What Open Source software do you use?" As you would expect, most respondents are using several versions of open source software. Here's what they told us: Apache Apache Cassandra, Elassandra  (ElasticSearch + Cassandra) , Spark, and Kafka  (as the core tech we provide through our managed service) are the big ones for us. We find that the governance arrangements and independence of the Apache Foundation make a great foundation for strong open source projects. 95% of what we do with big data is open source. We use  Apache Hadoop  and contribute back to grow skills and expertise. We use so much that it would be impossible to list. The core of our software is based on  Apache So...

Apache Beam: A unified model for batch and stream processing data

Apache Beam , a new distributed processing tool that's currently being incubated at the ASF, provides an abstraction layer allowing developers to focus on Beam code, using the Beam programming model. Thanks to Apache Beam, an implementation is agnostic to the runtime technologies being used, meaning you can switch technologies quickly and easily. Apache Beam also offers a programming model that is agnostic in terms of coverage—meaning the programming model is unified, which allows developers to implement both batch and streaming data processing. It’s actually where the Apache Beam name comes from: B (for Batch) and EAM (for strEAM). To implement your data processes using the Beam programming model, you will use an SDK or DSL provided by Beam. Now, you really have only one SDK: The Java SDK. However, a Python SDK is expected to be released and Beam will provide a Scala SDK and additional DSL (Declarative DSL with XML for instance) soon. With Apache Beam, first ...

Why Apache Beam? A Google Perspective

When we made the decision (in partnership with data Artisans, Cloudera, Talend, and a few other companies) to move the Google Cloud Dataflow SDK and runners into the Apache Beam incubator project, we did so with the following goal in mind: provide the world with an easy-to-use, but powerful model for data-parallel processing, both streaming and batch, portable across a variety of runtime platforms. Now that the dust on the initial code drops is starting to settle, we wanted to talk briefly about why this makes sense for us at Google and how we got here, given that Google hasn’t historically been directly involved in the OSS world of data-processing. Why does this make sense for Google? Google is a business, and as such, it should come as no surprise there’s a business motivation for us behind the Apache Beam move. That motivation hinges primarily on the desire to get as many Apache Beam pipelines as possible running on Cloud Dataflow. Given that, it may not seem intu...