Posts

Showing posts from December, 2019

2019 Was the Year Data Visualization Hit the Mainstream

Image
There’s always something going on in the field of data visualization but until recently it was only something that people in the field noticed. To the outside world, beyond perhaps an occasional Amazing Map®, Tufte workshop or funny pie chart, these trends are invisible. Not so in 2019, where data visualization featured prominently in major news stories and key players in the field created work that didn’t just do well on Dataviz Twitter but all over. 2019 saw the United States President amend a data visualization product with a sharpie. That should have been enough to make 2019 special, but the year also saw the introduction of a data visualization-focused fashion line, a touching book that uses data visualization to express some of the anxieties and feelings we all struggle with, as well as the creation of the first holistic professional society focused on data visualization. Original Article >>>

Why Not Airflow? An overview of the Prefect engine for Airflow users

Image
Airflow is a historically important tool in the data engineering ecosystem. It introduced the ability to combine a strict Directed Acyclic Graph (DAG) model with Pythonic flexibility in a way that made it appropriate for a wide variety of use cases. However, Airflow’s applicability is limited by its legacy as a monolithic batch scheduler aimed at data engineers principally concerned with orchestrating third-party systems employed by others in their organizations. Today, many data engineers are working more directly with their analytical counterparts. Compute and storage are cheap, so friction is low and experimentation prevails. Processes are fast, dynamic, and unpredictable. Airflow got many things right, but its core assumptions never anticipated the rich variety of data applications that has emerged. It simply does not have the requisite vocabulary to describe many of those activities. The seed that would grow into Prefect was first planted all the way back in 2016, in a seri

Researchers love PyTorch and TensorFlow

Image
In a recent survey—AI Adoption in the Enterprise, which drew more than 1,300 respondents—we found significant usage of several machine learning (ML) libraries and frameworks. About half indicated they used TensorFlow or scikit-learn, and a third reported they were using PyTorch or Keras. I recently attended an interesting RISELab presentation delivered by Caroline Lemieux describing recent work on AutoPandas and automation tools that rely on program synthesis. In the course of her presentation, Lemieux reviewed usage statistics they had gathered on different deep learning frameworks and data science libraries. She kindly shared some of that data with me, which I used to draw this chart: The numbers are based on simple full-text searches of papers posted on the popular e-print service arXiv.org. Specifically, they reflect the number of papers which mention (in a full-text search) each of the frameworks. Using this metric, the two most popular deep learning frameworks among resear

Operating a Large, Distributed System in a Reliable Way

Image
"The article is the collection of the practices I've found useful to reliably operate a large system at Uber, while working here. My experience is not unique - people working on similar sized systems go through a similar journey. I've talked with engineers at Google, Facebook, and Netflix, who shared similar experiences and solutions. Many of the ideas and processes listed here should apply to systems of similar scale, regardless of running on own data centers (like Uber mostly does) or on the cloud (where Uber sometimes scales to). However, the practices might be an overkill for smaller or less mission-critical systems." There's much ground to cover: Monitoring Oncall, Anomaly Detection & Alerting Outages & Incident Management Processes Postmortems, Incident Reviews & a Culture of Ongoing Improvements Failover Drills, Capacity Planning & Blackbox Testing SLOs, SLAs & Reporting on Them SRE as an Independent Team Reliability as an