The Growing Importance of Metadata Management Systems


As companies embrace digital technologies to transform their operations and products, many are using best-of-breed software, open source tools, and software as a service (SaaS) platforms to rapidly and efficiently integrate new technologies. This often means that data required for reports, analytics, and machine learning (ML) reside on disparate systems and platforms. As such, IT initiatives in companies increasingly involve tools and frameworks for data fusion and integration. Examples include tools for building data pipelines, data quality and data integration solutions, customer data platform (CDP) , master data management, and data markets.

Collecting, unifying, preparing, and managing data from diverse sources and formats has become imperative in this era of rapid digital transformation. Organizations that invest in foundational data technologies are much more likely to build solid foundation applications, ranging from BI and analytics to machine learning and AI.

In recent years, several technology companies developed internal metadata management systems and shared the challenges that led them to focus on metadata (this list includes: Airbnb’s Dataportal, Netflix’s Metacat, Uber’s Databook, LinkedIn’s Datahub, Lyft’s Amundsen, WeWork’s Marquez, Spotify’s Lexikon). These companies were facing fragmented data landscapes, while growing teams of analysts, data scientists, and engineers were needing to build data and machine learning products. The blog posts announcing these metadata management tools made it clear that these companies have come to rely on these metadata systems to power an array of data and machine learning services.

Beyond the need to unify and tame data from diverse systems, other reasons for the resurgence in interest in metadata technologies include:

  • Regulations like SR-11, GDPR, and the California Privacy Rights Act (CPRA) require organizations to manage data privacy, access, and control efficiently and at scale.

  • Debugging and root cause analysis are essential for machine learning and AI applications. The advent of new regulations raises the possibility of audits, making tools for data governance, model governance, and data lineage particularly critical.

  • Data governance at scale requires a certain level of automation, especially when many different software systems and platforms are involved.

  • Data discovery is particularly important for productivity reasons. Many users spend significant time finding and understanding the right data. A good data discovery product can help in this regard.


Comments