ETL and How it Changed Over Time

Modern world data and its usage has drastically changed when compared to a decade ago. There is a gap caused by the traditional ETL processes when processing modern data. The following are some of the main reasons for this: 
  • Modern data processes often include real-time streaming data, and organizations need real-time insights into processes. 
  • The systems need to perform ETL on data streams without using batch processing, and they should handle high data rates by scaling the system.
  • Some single-server databases are now replaced by distributed data platforms (e.g., Cassandra, MongoDB, Elasticsearch, SAAS apps), message brokers(e.g., Kafka, ActiveMQ, etc.) and several other types of endpoints.
    • The system should have the capability to plugin additional sources or sinks to connect on the go in a manageable way.
    • Repeated data processing due to ad hoc architecture has to be eliminated.
    • Change data capture technologies used with traditional ETL has to be integrated to also support traditional operations. 
  • Heterogeneous data sources are available and should look at maintenance with new requirements.
    • Sources and Target endpoints should be decoupled from the business logic. Data mapper layers should allow new sources and endpoints to be plugged in seamlessly in a manner that does not affect the transformation.
      Data mapping layer
      Data mapping layer
      • Received data should be standardized before transformation (or executing business rules).
      • Data should be converted to specific formats after transformation and before publishing to endpoints.
    • Data cleansing is not the only process defined in transformation in the modern world. There are many business requirements that organizations need to be fulfilled.
      • Current data processing should use filters, joins, aggregations, sequences, patterns, and enriching mechanisms to execute complex business rules.
       
      Read full article >>>

Comments