Posts

Showing posts from October, 2019

Q3 2019 BARC - Data & Analytics market update (by Carsten Bange)

Image
This quarter: Investments - record breaking third quarter for investments in data & analytics companies. M&A - The Hadoop market consolidates quickly and data science/AI companies add portfolio capabilities by acquisition. B2B software brand Idera has bought WhereScape. WhereScape develops and markets automation software for modern Data Warehouses deployed in in the cloud or on premise. Idera, a parent company for several database, development, and testing software companies announced to integrate WhereScape in their Database Tools unit. Other software providers in the same Idera unit are AquaFold featuring an IDE for visual database queries and Webyog featuring MySQL monitoring and management tools. With the acquisition of WhereScape, Idera improves its capabilities for empowering data professionals regarding DevOps use cases in complex data environments. WhereScape was a very visible player in the Data & Analytics ecosystem. It will be interesting to watch whether Idera...

Beast: Moving Data from Kafka to BigQuery

Image
In order to serve customers across 19+ products, GOJEK places a lot of emphasis on data. Our Data Warehouse, built by integrating data from multiple applications and sources, helps our team of data scientists, as well as business and product analysts make solid, data-driven decisions. This post explains our open source solution for easy movement of data from Kafka to BigQuery. Data Warehouse setup at GOJEK. We use Google Bigquery (BQ) as our Data Warehouse, which serves as a powerful tool for interactive analysis. This has proven extremely valuable for our use cases. Our approach to push data to our warehouse is to first push the data to Kafka. We rely on multiple Kafka clusters to ingest relevant events across teams. A common approach to push data from Kafka to BigQuery is to first push it to GCS, and then import said data into BigQuery from GCS. While this solves the use case of running analytics on historical data, we also use BigQuery for near-real-time analytics & r...

Data Processing Pipeline Patterns

Image
Data produced by applications, devices, or humans must be processed before it is consumed. By definition, a data pipeline represents the flow of data between two or more systems. It is a set of instructions that determine how and when to move data between these systems. My last blog conveyed how connectivity is foundational to a data platform. In this blog, I will describe the different data processing pipelines that leverage different capabilities of the data platform, such as connectivity and data engines for processing. There are many data processing pipelines. One may: “Integrate” data from multiple sources Perform data quality checks or standardize data Apply data security-related transformations, which include masking, anonymizing, or encryption Match, merge, master, and do entity resolution Share data with partners and customers in the required format, such as HL7 Consumers or “targets” of data pipelines may include: Data warehouses like ...

Modern applications at AWS

Image
Innovation has always been part of the Amazon DNA, but about 20 years ago, we went through a radical transformation with the goal of making our iterative process—"invent, launch, reinvent, relaunch, start over, rinse, repeat, again and again"—even faster. The changes we made affected both how we built applications and how we organized our company. Back then, we had only a small fraction of the number of customers that Amazon serves today. Still, we knew that if we wanted to expand the products and services we offered, we had to change the way we approached application architecture. The giant, monolithic "bookstore" application and giant database that we used to power Amazon.com limited our speed and agility. Whenever we wanted to add a new feature or product for our customers, like video streaming, we had to edit and rewrite vast amounts of code on an application that we'd designed specifically for our first product—the bookstore. This was a long, unwieldy p...