This is primarily a notebook to register the content I find worth saving and sharing.
This is not a blog full of my well structured clever thoughts. However, if you find any, let me know =)
The Future of Data Engineering
Get link
Facebook
X
Pinterest
Email
Other Apps
-
Data engineering’s job is to help an organization move and process data. This generally requires two different systems, broadly speaking: a data pipeline, and a data warehouse. The data pipeline is responsible for moving the data, and the data warehouse is responsible for processing it. I acknowledge that this is a bit overly simplistic. You can do processing in the pipeline itself by doing transformations between extraction and loading with batch and stream processing. The “data warehouse” now includes many storage and processing systems (Flink, Spark, Presto, Hive, BigQuery, Redshift, etc), as well as auxiliary systems such as data catalogs, job schedulers, and so on. Still, I believe the paradigm holds.
The industry is working through changes in how these systems are built and managed. There are four areas, in particular, where I expect to see shifts over the next few years.
Timeliness: From batch to realtime
Connectivity: From one:one bespoke integrations to many:many
Centralization: From centrally managed to self-serve tooling
Automation: From manually managed to automated tooling
Comments
Post a Comment