Posts

Showing posts with the label monitoring

Data Reliability at Scale: How Fox Digital Architected its Modern Data Stack

Image
  As distributed architectures continue to become a new gold standard for data driven organizations, this kind of self-serve motion would be a dream come true for many data leaders. So when the Monte Carlo team got the chance to sit down with Alex, we took a deep dive into how he made it happen.  Here’s how his team architected a hybrid data architecture that prioritizes democratization and access, while ensuring reliability and trust at every turn. Exercise “Controlled Freedom” when dealing with stakeholders Alex has built decentralized access to data at Fox on a foundation he calls “controlled freedom.” In fact, he believes using your data team as the single source of truth within an organization actually creates the biggest silo.  So instead of becoming a guardian and bottleneck, Alex and his data team focus on setting certain parameters around how data is ingested and supplied to stakeholders. Within the framework, internal data consumers at Fox have the freedom to cr...

Data Observability Ushers In A New Era Enabling Golden Age Of Data

Image
Have we entered the Golden Age of Data? Modern enterprises are collecting, producing, and processing more data than ever before. According to a February 2020 IDG survey of data professionals, average corporate data volumes are increasing by 63% per month. 10% of respondents even reported that their data volumes double every month. Large companies are investing heavily to transform themselves into data-driven organizations that can quickly adapt to the fast pace of a modern economy. They gather huge amounts of data from customers and generate reams of data from transactions. They continuously process data in an attempt to personalize customer experiences, optimize business processes, and drive strategic decisions. The Real Challenge with Data In theory, breakthrough open-source technologies, such as Spark, Kafka, and Druid are supposed to help just about any organization benefit from massive amounts of customer and operational data just like they benefit Facebook, Apple, Google, Microso...

Operating a Large, Distributed System in a Reliable Way

Image
"The article is the collection of the practices I've found useful to reliably operate a large system at Uber, while working here. My experience is not unique - people working on similar sized systems go through a similar journey. I've talked with engineers at Google, Facebook, and Netflix, who shared similar experiences and solutions. Many of the ideas and processes listed here should apply to systems of similar scale, regardless of running on own data centers (like Uber mostly does) or on the cloud (where Uber sometimes scales to). However, the practices might be an overkill for smaller or less mission-critical systems." There's much ground to cover: Monitoring Oncall, Anomaly Detection & Alerting Outages & Incident Management Processes Postmortems, Incident Reviews & a Culture of Ongoing Improvements Failover Drills, Capacity Planning & Blackbox Testing SLOs, SLAs & Reporting on Them SRE as an Independent Team Reliability as an...

Model governance and model operations: building and deploying robust, production-ready machine learning models

Image
O'Reilly's surveys over the past couple of years have shown growing interest in machine learning (ML) among organizations from diverse industries. A few factors are contributing to this strong interest in implementing ML in products and services. First, the machine learning community has conducted groundbreaking research in many areas of interest to companies, and much of this research has been conducted out in the open via preprints and conference presentations. We are also beginning to see researchers share sample code written in popular open source libraries, and some even share pre-trained models. Organizations now also have more use cases and case studies from which to draw inspiration—no matter what industry or domain you are interested in, chances are there are many interesting ML applications you can learn from. Finally, modeling tools are improving, and automation is beginning to allow new users to tackle problems that used to be the province of experts. With the s...