Enterprise data integration with an operational data hub
Big data (also called NoSQL) technologies facilitate the ingestion,
processing, and search of data with no regard to schema (database
structure). Web technologies such as Google, LinkedIn, and Facebook use
big data technologies to process the tremendous amount of data from
every possible source without regard to structure, and offer a
searchable interface to access it. Modern NoSQL technologies have
evolved to offer capabilities to govern, process, secure, and deliver
data, and have facilitated the development of an integration pattern
called the operational data hub (ODH).
The Centers for Medicare and Medicaid Services (CMS) and other organizations (public and private) in the health, finance, banking, entertainment, insurance, and defense sectors (amongst others) utilize the capabilities of ODH technologies for enterprise data integration. This gives them the ability to access, integrate, master, process, and deliver data across the enterprise.
However, if additional information such as trends in employee satisfaction scores were required, the development team had to be engaged again to elicit requirements, source the data, determine the impact on the database and then build the processes that updates the data warehouse. This process had to be repeated every time the data warehouse needed updating. Each update to the data warehouse typically included a tremendous amount of development and testing to ensure the updated schema did not break existing code. For this reason, the level of effort for analyzing and implementing any change was typically enormous.
Each department, having developed its own operational systems and own data warehouses, could execute business processes and draw analytical information. However, this practice caused isolation in information technology resources—referred to as “data silos.” It is very difficult to draw analytical correlations across data silos. For instance, if a CEO wanted to know the impact of seasonal staff-turnover on the ability to fulfill product delivery and shipment, it would require that HR, sales, production, and shipping data be correlated over time. The effort involved typically resulted in huge delays in time to produce it and significant cost.
Many organizations used relational technologies to implement enterprise data warehouses (EDWs) across the relevant data silos to answer enterprise-wide questions. However, these EDWs suffer from the same challenges as their smaller, departmental cousins. The effort associated with designing and implementing the schema, data extracts and data feeds are significant. Once developed, changes are typically not any easier either.
However, the true value of the ODH is realized when we leverage the data governance, processing, and consistency capabilities to establish data processing patterns upon ingest. In addition to ingesting the raw data, additional processes can:
Why do we need operational data hubs? We need them to facilitate enterprise data integration with the flexibility of big data/NoSQL technologies, but with the added rigor, governance, and consistency required in an enterprise environment. The ODH facilitates data exchange across the enterprise and allows for analytical processing of raw or mastered data at a fraction of the cost of traditional technologies.
(Original article: https://www.oreilly.com/ideas/enterprise-data-integration-with-an-operational-data-hub)
The Centers for Medicare and Medicaid Services (CMS) and other organizations (public and private) in the health, finance, banking, entertainment, insurance, and defense sectors (amongst others) utilize the capabilities of ODH technologies for enterprise data integration. This gives them the ability to access, integrate, master, process, and deliver data across the enterprise.
Traditional model and data silos
For decades, the standard pattern to produce operational data and enterprise analytics was to develop data warehouses with data schemas dedicated to the purpose of use. Let’s consider an example: an HR department required detailed analysis of human resource data. The development team was engaged to elicit requirements for the reports that would be generated and design a database schema to store that data. Data feeds were developed to pull HR data from all relevant systems (such as payroll and vacation registers), then insert or update tables in the data warehouse to build the required analytical data. Once completed, the HR director could pull metrics on trends in pay raises, tenure, and paid time off.However, if additional information such as trends in employee satisfaction scores were required, the development team had to be engaged again to elicit requirements, source the data, determine the impact on the database and then build the processes that updates the data warehouse. This process had to be repeated every time the data warehouse needed updating. Each update to the data warehouse typically included a tremendous amount of development and testing to ensure the updated schema did not break existing code. For this reason, the level of effort for analyzing and implementing any change was typically enormous.
Each department, having developed its own operational systems and own data warehouses, could execute business processes and draw analytical information. However, this practice caused isolation in information technology resources—referred to as “data silos.” It is very difficult to draw analytical correlations across data silos. For instance, if a CEO wanted to know the impact of seasonal staff-turnover on the ability to fulfill product delivery and shipment, it would require that HR, sales, production, and shipping data be correlated over time. The effort involved typically resulted in huge delays in time to produce it and significant cost.
Many organizations used relational technologies to implement enterprise data warehouses (EDWs) across the relevant data silos to answer enterprise-wide questions. However, these EDWs suffer from the same challenges as their smaller, departmental cousins. The effort associated with designing and implementing the schema, data extracts and data feeds are significant. Once developed, changes are typically not any easier either.
How are things better with an ODH?
An ODH combines the flexible schema processing capabilities of NoSQL technologies with the governance, rigor, and transactional integrity of relational technologies. To illustrate how an ODH would be helpful, let’s consider the example provided above. Since an ODH is built on a NoSQL technology, and NoSQL technologies allow data to be ingested without consideration to schema, the organization can start ingesting available data in raw format into the ODH. Our organization has the following systems across the enterprise:- A payroll system that includes employee, position, benefits, and payroll information
- A vacation register that manages, approves, and tracks paid time off
- A training system that tracks compliance training and job-related training
- A product management system that manages product development and parts ordering
- Warehouse management that tracks products on hand and manages shipping
- An order management system that manages sales and customer information
- A customer relationship management (CRM) system that manages customer information and tracks sales
- A document management system that manages electronic versions of paper documents
- the data queries (simple SQL) from the product management system
- the processes to ingest the files from the existing interfaces
- the processes to ingest the PDF files from document management system
However, the true value of the ODH is realized when we leverage the data governance, processing, and consistency capabilities to establish data processing patterns upon ingest. In addition to ingesting the raw data, additional processes can:
- group cohorts of data based on identifiers or configurable fuzzy logic
- apply an in-place harmonized (canonical) model of data elements
- apply data quality updates
- create master records with updates from disparate systems
Why do we need operational data hubs? We need them to facilitate enterprise data integration with the flexibility of big data/NoSQL technologies, but with the added rigor, governance, and consistency required in an enterprise environment. The ODH facilitates data exchange across the enterprise and allows for analytical processing of raw or mastered data at a fraction of the cost of traditional technologies.
(Original article: https://www.oreilly.com/ideas/enterprise-data-integration-with-an-operational-data-hub)
Comments
Post a Comment