The Future of Data Engineering

Data engineering’s job is to help an organization move and process data. This generally requires two different systems, broadly speaking: a data pipeline, and a data warehouse. The data pipeline is responsible for moving the data, and the data warehouse is responsible for processing it. I acknowledge that this is a bit overly simplistic. You can do processing in the pipeline itself by doing transformations between extraction and loading with batch and stream processing. The “data warehouse” now includes many storage and processing systems (Flink, Spark, Presto, Hive, BigQuery, Redshift, etc), as well as auxiliary systems such as data catalogs, job schedulers, and so on. Still, I believe the paradigm holds.

The industry is working through changes in how these systems are built and managed. There are four areas, in particular, where I expect to see shifts over the next few years.

Timeliness: From batch to realtime
Connectivity: From one:one bespoke integrations to many:many
Centralization: From centrally managed to self-serve tooling
Automation: From manually managed to automated tooling

Search This Blog

Notes about Cutting-Edge Technologies and Everything

The Future of Data Engineering

Comments

Post a Comment