Posts

Showing posts from March, 2021

The Growing Importance of Metadata Management Systems

Image
As companies embrace digital technologies to transform their operations and products, many are using best-of-breed software, open source tools, and software as a service (SaaS) platforms to rapidly and efficiently integrate new technologies. This often means that data required for reports, analytics, and machine learning (ML) reside on disparate systems and platforms. As such, IT initiatives in companies increasingly involve tools and frameworks for data fusion and integration. Examples include tools for building data pipelines, data quality and data integration solutions, customer data platform ( CDP ) ,   master data management , and   data markets . Collecting, unifying, preparing, and managing data from diverse sources and formats has become imperative in this era of rapid digital transformation. Organizations that invest in  foundational data technologies  are much more likely to build solid foundation applications, ranging from BI and analytics to machine learning and AI. In rece

Visualizing Data Timeliness at Airbnb

Image
  Imagine you are a business leader ready to start your day, but you wake up to find that your daily business report is empty — the data is late, so now you are blind. Over the last year, multiple teams came together to build  SLA Tracker , a visual analytics tool to facilitate a culture of data timeliness at Airbnb. This data product enabled us to address and systematize the following challenges of data timeliness: When  should a dataset be considered late? How   frequently  are datasets late? Why  is a dataset late? This project is a critical part of our efforts to achieve high data quality and required overcoming many technical, product, and organizational challenges in order to build. In this article, we focus on the  product design : the journey of how we designed and built data visualizations that could make sense of the deeply complex data of data timeliness. Continue reading >>>

ThoughtWorks Decoder puts tech into a business context

Image
The tech landscape changes pretty fast. There are always new terms, techniques and tools emerging. But don't let tech be an enigma: ThoughtWorks Decoder is here to help Simply search for the term you're interested in, and we'll give you the lowdown on what it is, what it can do for your enterprise and what the potential drawbacks are. ThoughtWorks Decoder >>>

Ten Use Cases to Enable an Organization with Metadata and Catalogs

Image
Enterprises are modernizing their data platforms and associated tool-sets to serve the fast needs of data practitioners, including data scientists, data analysts, business intelligence and reporting analysts, and self-service-embracing business and technology personnel. However, as the tool-stack in most organizations is getting modernized, so is the variety of metadata generated. As the volume of data is increasing every day, thereupon, the metadata associated with data is expanding, as is the need to manage it. The first thought that strikes us when we look at a data landscape and hear about a catalog is, “It scans any database ranging from Relational to NoSQL or Graph and gives out useful information.” Name Modeled data-type Inferred data types Patterns of data Length with minimum and largest threshold Minimal and maximum values Other profiling characteristics of data like frequency of values and their distribution What Is the Basic Benefit of Metadata Managed in Catalogs? 1. Increa

Gartner Magic Quadrant for Data Science and Machine Learning Platforms 2021

Image
This report assesses 20 vendors of platforms that data scientists and others can use to source data, build models and operationalize machine learning. It will help them make the right choice from a crowded field in a maturing DSML platform market that continues to show rapid product development. Market Definition/Description Gartner  defines a data science and machine learning (DSML) platform as a core product and supporting portfolio of coherently integrated products, components, libraries and frameworks (including proprietary, partner-sourced and open-source). Its primary users are data science professionals, including expert data scientists, citizen data scientists, data engineers, application developers and machine learning (ML) specialists. The core product and supporting portfolio: Are sufficiently well-integrated to provide a consistent “look and feel.” Create a user experience in which all components are reasonably interoperable in support of an analytics pipeline. The  DSML pl