Posts

Showing posts with the label Data Discovery

2022 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms

Image
  Today’s analytics and BI platforms are augmented throughout and enable users to compose low/no-code workflows and applications. Cloud ecosystems and alignment with digital workplace tools are key selection factors. This research helps data and analytics leaders plan for and select these platforms. Analytics and business intelligence (ABI) platforms enable less technical users, including businesspeople, to model, analyze, explore, share and manage data, and collaborate and share findings, enabled by IT and augmented by artificial intelligence (AI). ABI platforms may optionally include the ability to create, modify or enrich a semantic model including business rules. Today’s ABI platforms have an emphasis on visual self-service for end users, augmented by AI to deliver automated insights. Increasingly, the focus of augmentation is shifting from the analyst persona to the consumer or decision maker. To achieve this, automated insights must not only be statistically relevant, but the...

The Growing Importance of Metadata Management Systems

Image
As companies embrace digital technologies to transform their operations and products, many are using best-of-breed software, open source tools, and software as a service (SaaS) platforms to rapidly and efficiently integrate new technologies. This often means that data required for reports, analytics, and machine learning (ML) reside on disparate systems and platforms. As such, IT initiatives in companies increasingly involve tools and frameworks for data fusion and integration. Examples include tools for building data pipelines, data quality and data integration solutions, customer data platform ( CDP ) ,   master data management , and   data markets . Collecting, unifying, preparing, and managing data from diverse sources and formats has become imperative in this era of rapid digital transformation. Organizations that invest in  foundational data technologies  are much more likely to build solid foundation applications, ranging from BI and analytics to machine learn...

Ten Use Cases to Enable an Organization with Metadata and Catalogs

Image
Enterprises are modernizing their data platforms and associated tool-sets to serve the fast needs of data practitioners, including data scientists, data analysts, business intelligence and reporting analysts, and self-service-embracing business and technology personnel. However, as the tool-stack in most organizations is getting modernized, so is the variety of metadata generated. As the volume of data is increasing every day, thereupon, the metadata associated with data is expanding, as is the need to manage it. The first thought that strikes us when we look at a data landscape and hear about a catalog is, “It scans any database ranging from Relational to NoSQL or Graph and gives out useful information.” Name Modeled data-type Inferred data types Patterns of data Length with minimum and largest threshold Minimal and maximum values Other profiling characteristics of data like frequency of values and their distribution What Is the Basic Benefit of Metadata Managed in Catal...

Data Discovery Platforms and Their Open Source Solutions

Image
In the past year or two, many companies have shared their data discovery platforms (the latest being Facebook’s Nemo). Based on this list, we now know of more than 10 implementations. I haven’t been paying much attention to these developments in data discovery and wanted to catch up. I was interested in: The questions these platforms help answer The features developed to answer these questions How they compare with each other What open source solutions are available By the end of this, we’ll learn about the key features that solve 80% of data discoverability problems. We’ll also see how the platforms compare on these features, and take a closer look at open source solutions available. Questions we ask in the data discovery process Before discussing platform features, let’s briefly go over some common questions in the data discovery process. Where can I find data about ____? If we don’t know the right terms, this is especially challenging. For user browsing behavior, do we search for “c...

Nemo: Data discovery at Facebook

Image
Large-scale companies serve millions or even billions of people who depend on the services these companies provide for their everyday needs. To keep these services running and delivering meaningful experiences, the teams behind them need to find the most relevant and accurate information quickly so that they can make informed decisions and take action. Finding the right information can be hard for several reasons. The problem might be discovery — the relevant table might have an obscure or nondescript name, or different teams might have constructed overlapping data sets. Or, the problem could be one of confidence — the dashboard someone is looking at might have been superseded by another source six months ago.  Many companies, such as Airbnb, Lyft, Netflix, and Uber, have built their own custom solutions for this challenge. For us, it was important to make the data discovery process simple and fast. Funneling everything through data experts to locate the necessary data each time we...

Shopify's approach to data discovery

Image
Humans generate a lot of data. Every two days we create as much data as we did from the beginning of time until 2003! The International Data Corporation estimates the global datasphere totaled 33 zettabytes (one trillion gigabytes) in 2018. The estimate for 2025 is 175 ZBs, an increase of 430%. This growth is challenging organizations across all industries to rethink their data pipelines. The nature of data usage is problem driven, meaning data assets (tables, reports, dashboards, etc.) are aggregated from underlying data assets to help decision making about a particular business problem, feed a machine learning algorithm, or serve as an input to another data asset. This process is repeated multiple times, sometimes for the same problems, and results in a large number of data assets serving a wide variety of purposes. Data discovery and management is the practice of cataloguing these data assets and all of the applicable metadata that saves time for data professionals, increasing data ...

Data Discovery for Data Scientists at Spotify

Image
Diagnosing the problem In 2016, as we started migrating to the Google Cloud Platform, we saw an explosion of dataset creation in BigQuery. At this time, we also drastically increased our hiring of insights specialists (data scientists, analysts, user researchers, etc.) at Spotify, resulting in more research and insights being produced across the company. However, research would often only have a localized impact in certain parts of the business, going unseen by others that might find it useful to influence their decision making. Datasets lacked clear ownership or documentation making it difficult for data scientists to find them. We believed that the crux of the problem was that we lacked a centralized catalog of these data and insights resources. In early 2017, we released Lexikon, a library for data and insights, as the solution to this problem. The first release allowed users to search and browse available BigQuery tables (i.e. datasets)— as well as discover k...

Amundsen — Lyft’s data discovery & metadata engine

Image
The problem Unprecedented growth in Data volumes has led to 2 big challenges: Productivity — Whether it’s building a new model, instrumenting a new metric, or doing adhoc analysis, how can I most productively and effectively make use of this data?   Compliance — When collecting data about a company’s users, how do organizations comply with increasing regulatory and compliance demands and uphold the trust of their users? The key to solving these problems lies not in data, but in the metadata. And, to show you how, let’s go through a journey of how we solved a part of the productivity problem at Lyft using metadata. Productivity At a 50,000 feet level, the data scientist workflow looks like the following.  Read full article >>>