Notes about Cutting-Edge Technologies and Everything

Posts

Showing posts with the label Self-Service

Data Reliability at Scale: How Fox Digital Architected its Modern Data Stack

- March 04, 2022

As distributed architectures continue to become a new gold standard for data driven organizations, this kind of self-serve motion would be a dream come true for many data leaders. So when the Monte Carlo team got the chance to sit down with Alex, we took a deep dive into how he made it happen. Here’s how his team architected a hybrid data architecture that prioritizes democratization and access, while ensuring reliability and trust at every turn. Exercise “Controlled Freedom” when dealing with stakeholders Alex has built decentralized access to data at Fox on a foundation he calls “controlled freedom.” In fact, he believes using your data team as the single source of truth within an organization actually creates the biggest silo. So instead of becoming a guardian and bottleneck, Alex and his data team focus on setting certain parameters around how data is ingested and supplied to stakeholders. Within the framework, internal data consumers at Fox have the freedom to cr...

See post »

2021 Gartner Magic Quadrant for Data Integration Tools

- August 27, 2021

Strategic Planning Assumptions Through 2022, manual data management tasks will be reduced by 45% through the addition of machine learning and automated service-level management. By 2023, AI-enabled automation in data management and integration will reduce the need for IT specialists by 20%. Read report >>>

See post »

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

- February 02, 2021

Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy. Usually Data scientists and engineers write Extract-Transform-Load (ETL) jobs and pipelines using big data compute technologies, like Spark or Presto, to process this data and periodically compute key information for a member or a video. The processed data is typically stored as data warehouse tables in AWS S3. Iceberg is widely adopted in Netflix as a data warehouse table format that addresses many of the usability and performance problems with Hive tables. At Netflix, we also heavily embrace a microservice architecture that emphasizes separation of concerns. Many of these services often have the requirement to do a fast lookup for this fine-grained data which is generated periodically. For example, in order to enha...

See post »

Data Mesh Principles and Logical Architecture v2

- December 04, 2020

Our aspiration to augment and improve every aspect of business and life with data, demands a paradigm shift in how we manage data at scale. While the technology advances of the past decade have addressed the scale of volume of data and data processing compute, they have failed to address scale in other dimensions: changes in the data landscape, proliferation of sources of data, diversity of data use cases and users, and speed of response to change. Data mesh addresses these dimensions, founded in four principles: domain-oriented decentralized data ownership and architecture, data as a product, self-serve data infrastructure as a platform, and federated computational governance. Each principle drives a new logical view of the technical architecture and organizational structure. The original writeup, How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh - which I encourage you to read before joining me back here - empathized with today’s pain points of architectural and or...

See post »

Self-Service Data Preparation: Research to Practice

- April 03, 2019

The story of Self-Service Data Preparation and academic research behind Trifacta, which is also a SaaS offering in GCP Dataprep: http://sites.computer.org/debull/A18june/p23.pdf

See post »

The road to a collaborative self-service model

- July 30, 2016

In a previous blog we discussed how you enable a highly collaborative and data driven organization through the concepts of multi-speed or bi-modal IT. We then expanded on this through a discussion on the overall information and analytic lifecycle and the interaction with five persona across that lifecycle. You can read those blogs here: Multi-speed IT drives fast business experiments and empowered citizen analysts Enabling a highly collaborative and data-driven organization Interestingly enough, Forrester Research recently published a report titled “The False Promise of Bimodal IT” which was referenced in an article on CIO.com . Forrester argues this paradigm is fundamentally a mistake as it creates a two class system with the implication that you have a slow moving entity focused on back office systems (IT) with a second group focused on fast roll out of digital products. From an organizational perspective the arguments being made are valid, but when I th...

See post »