Notes about Cutting-Edge Technologies and Everything

Posts

Showing posts with the label DataOps

Data Reliability at Scale: How Fox Digital Architected its Modern Data Stack

- March 04, 2022

As distributed architectures continue to become a new gold standard for data driven organizations, this kind of self-serve motion would be a dream come true for many data leaders. So when the Monte Carlo team got the chance to sit down with Alex, we took a deep dive into how he made it happen. Here’s how his team architected a hybrid data architecture that prioritizes democratization and access, while ensuring reliability and trust at every turn. Exercise “Controlled Freedom” when dealing with stakeholders Alex has built decentralized access to data at Fox on a foundation he calls “controlled freedom.” In fact, he believes using your data team as the single source of truth within an organization actually creates the biggest silo. So instead of becoming a guardian and bottleneck, Alex and his data team focus on setting certain parameters around how data is ingested and supplied to stakeholders. Within the framework, internal data consumers at Fox have the freedom to cr...

See post »

The DataOps Landscape

- June 01, 2021

Data has emerged as an imperative foundational asset for all organizations. Data fuels significant initiatives such as digital transformation and the adoption of analytics, machine learning, and AI. Organizations that are able to tame, manage, and unlock their data assets stand to benefit in myriad ways, including improvements to decision-making and operational efficiency, better fraud prediction and prevention, better risk management and control, and more. In addition, data products and services can often lead to new or additional revenue. As companies increasingly depend on data to power essential products and services, they are investing in tools and processes to manage essential operations and services. In this post, we describe these tools as well as the community of practitioners using them. One sign of the growing maturity of these tools and practices is that a community of engineers and developers are beginning to coalesce around the term “DataOps” (data operations). Our conver...

See post »

Visualizing Data Timeliness at Airbnb

- March 19, 2021

Imagine you are a business leader ready to start your day, but you wake up to find that your daily business report is empty — the data is late, so now you are blind. Over the last year, multiple teams came together to build SLA Tracker , a visual analytics tool to facilitate a culture of data timeliness at Airbnb. This data product enabled us to address and systematize the following challenges of data timeliness: When should a dataset be considered late? How frequently are datasets late? Why is a dataset late? This project is a critical part of our efforts to achieve high data quality and required overcoming many technical, product, and organizational challenges in order to build. In this article, we focus on the product design : the journey of how we designed and built data visualizations that could make sense of the deeply complex data of data timeliness. Continue reading >>>

See post »

Ten Use Cases to Enable an Organization with Metadata and Catalogs

- March 11, 2021

Enterprises are modernizing their data platforms and associated tool-sets to serve the fast needs of data practitioners, including data scientists, data analysts, business intelligence and reporting analysts, and self-service-embracing business and technology personnel. However, as the tool-stack in most organizations is getting modernized, so is the variety of metadata generated. As the volume of data is increasing every day, thereupon, the metadata associated with data is expanding, as is the need to manage it. The first thought that strikes us when we look at a data landscape and hear about a catalog is, “It scans any database ranging from Relational to NoSQL or Graph and gives out useful information.” Name Modeled data-type Inferred data types Patterns of data Length with minimum and largest threshold Minimal and maximum values Other profiling characteristics of data like frequency of values and their distribution What Is the Basic Benefit of Metadata Managed in Catal...

See post »

The Forrester Wave™: Value Stream Management Solutions, Q3 2020

- January 03, 2021

Why Read This Report In our 30-criterion evaluation of value stream management (VSM) providers, we identified the 11 most significant ones — Atlassian, Blueprint, CloudBees, ConnectALL, Digital.ai, GitLab, IBM, Plutora, ServiceNow, Targetprocess, and Tasktop — and researched, analyzed, and scored them. This report shows how each provider measures up and helps application development and delivery (AD&D) professionals select the right one for their needs. Strong interest in VSM is driven primarily by three roles: 1) product owners and/or program managers who need data to help drive strategies, set priorities, and unlock team potential; 2) development leaders who use VSM to create connected, automated, and self-governed CI/CD pipelines with observability for improving and accelerating the pace of delivery; and 3) release engineers who use VSM for governance, compliance, and upstream observability to manage risk. (see endnote 3) To serve these roles effectively, customers should look f...

See post »

How DataOps Amplifies Data and Analytics Business Value

- December 10, 2020

DataOps techniques can provide a more agile and collaborative approach to building and managing data pipelines. The pandemic has accelerated the need for data and analytics leaders to deliver data and analytics insight faster, with higher quality and resiliency in the face of constant change. Organizations need to make better-informed and faster decisions with a focus on automation, real-time risk assessment and mitigation, continuous value delivery and agility. The point of DataOps is to change how people collaborate around data and how it is used in the organization As a result, data and analytics leaders are increasingly applying DataOps techniques that provide a more agile and collaborative approach to building and managing data pipelines. What is DataOps? Gartner defines DataOps as a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization. “The poi...

See post »

Data Observability Ushers In A New Era Enabling Golden Age Of Data

- November 16, 2020

Have we entered the Golden Age of Data? Modern enterprises are collecting, producing, and processing more data than ever before. According to a February 2020 IDG survey of data professionals, average corporate data volumes are increasing by 63% per month. 10% of respondents even reported that their data volumes double every month. Large companies are investing heavily to transform themselves into data-driven organizations that can quickly adapt to the fast pace of a modern economy. They gather huge amounts of data from customers and generate reams of data from transactions. They continuously process data in an attempt to personalize customer experiences, optimize business processes, and drive strategic decisions. The Real Challenge with Data In theory, breakthrough open-source technologies, such as Spark, Kafka, and Druid are supposed to help just about any organization benefit from massive amounts of customer and operational data just like they benefit Facebook, Apple, Google, Microso...

See post »

AIOps Platforms (Gartner)

- October 21, 2020

AIOps is an emerging technology and addresses something I’m a big fan of – improving IT Operations. So I asked fellow Gartner analyst Colin Fletcher for a guest blog on the topic… Roughly three years ago, it was looking like we were going to see many enterprise IT operations leaders put themselves in the precarious role of “ the cobbler’s children ” by forgoing investment in Artificial Intelligence (AI) to help them do their work better, faster, and cheaper. We were hearing from many IT ops leaders building incredibly sophisticated Big Data and Advanced Analytics systems for business stakeholders, but were themselves using rudimentary, reactive red/yellow/green lights and manual steps to help run the infrastructure required to keep those same systems up and running. Further, we’re all now familiar in our personal lives with dynamic recommendations from online retailers, search providers, virtual personal assistants, and entertainment services, Talk about a paradox! Now I...

See post »

Project Hop - Exploring the future of data integration

- April 30, 2020

Project Hop was announced at KCM19 back in November 2019. The first preview release is available since April, 10th. We’ve been posting about it on our social media accounts, but what exactly is Project Hop? Let’s explore the project in a bit more detail. In this post, we'll have a look at what Project Hop is, why the project was started and why know.bi wants to go all in on it. What is Project Hop? hopAs the project’s tagline says, Project Hop intends to explore the future of data integration. We take that quite literally. We’ve seen massive changes in the data processing landscape over the last decade (the rise and fall of the Hadoop ecosystem, just to name one). All of these changes need to be supported and integrated into your data engineering and data processing systems. Apart from these purely technical challenges, the data processing life cycle has become a software life cycle. Robust and reliable data processing requires testing, a fast and flexible deployment...

See post »

What is the Microsoft's Team Data Science Process?

- January 15, 2020

The Team Data Science Process (TDSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently. TDSP helps improve team collaboration and learning by suggesting how team roles work best together. TDSP includes best practices and structures from Microsoft and other industry leaders to help toward successful implementation of data science initiatives. The goal is to help companies fully realize the benefits of their analytics program. This article provides an overview of TDSP and its main components. We provide a generic description of the process here that can be implemented with different kinds of tools. A more detailed description of the project tasks and roles involved in the lifecycle of the process is provided in additional linked topics. Guidance on how to implement the TDSP using a specific set of Microsoft tools and infrastructure that we use to implement the TDSP in our teams is also provi...

See post »

DataOps Principles: How Startups Do Data The Right Way

- June 01, 2019

If you have been trying to harness the power of data science and machine learning — but, like many teams, struggling to produce results — there’s a secret you are missing out on. All of those models and sophisticated insights require lots of good data, and the best way to get good data quickly is by using DataOps. What is DataOps? It’s a way of thinking about how an organization deals with data. It’s a set of tools to automate processes and empower individuals. And it’s a new DataOps Engineer role designed to make that thinking real by managing and building those tools. DataOps Principles DataOps was inspired by DevOps, which brought the power of agile development to operations (infrastructure management and production deployment). DevOps transformed the way that software development is done; and now DataOps is transforming the way that data management is done. For larger enterprises with a dedicated data engineering team, DataOps is about breaking down barriers and re-...

See post »