The Future of Data Engineering


Image result for data engineeringData engineering’s job is to help an organization move and process data. This generally requires two different systems, broadly speaking: a data pipeline, and a data warehouse. The data pipeline is responsible for moving the data, and the data warehouse is responsible for processing it. I acknowledge that this is a bit overly simplistic. You can do processing in the pipeline itself by doing transformations between extraction and loading with batch and stream processing. The “data warehouse” now includes many storage and processing systems (Flink, Spark, Presto, Hive, BigQuery, Redshift, etc), as well as auxiliary systems such as data catalogs, job schedulers, and so on. Still, I believe the paradigm holds.


The industry is working through changes in how these systems are built and managed. There are four areas, in particular, where I expect to see shifts over the next few years.

  • Timeliness: From batch to realtime
  • Connectivity: From one:one bespoke integrations to many:many
  • Centralization: From centrally managed to self-serve tooling
  • Automation: From manually managed to automated tooling

Comments