Posts

Showing posts with the label Uber

Cost-Efficient Open Source Big Data Platform at Uber

Image
  In this blog post, we shared efforts and ideas in improving the platform efficiency of Uber’s Big Data Platform, including file format improvements, HDFS erasure coding, YARN scheduling policy improvements, load balancing, query engines, and Apache Hudi.  These improvements have resulted in significant savings.  In addition, we explored some open challenges like analytics and online colocation, and pricing mechanisms.  However, as the framework outlined in our previous post established, platform efficiency improvements alone do not guarantee efficient operation.  Controlling the supply and the demand of data is equally important, which we will address in an upcoming post. As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When Big Data rose to become one of our largest operational expenses, we began an initiative to reduce costs on our data platform, which divides challe...

Turbocharging Analytics at Uber with Data Science Workbench

Image
Millions of Uber trips take place each day across nearly 80 countries, generating information on traffic, preferred routes, estimated times of arrival/delivery, drop-off locations, and more that enables us to facilitate better experiences for users. To make our data exploration and analysis more streamlined and efficient, we built Uber’s data science workbench (DSW), an all-in-one toolbox for interactive analytics and machine learning that leverages aggregate data. DSW centralizes everything a data scientist needs to perform data exploration, data preparation, ad-hoc analyses, model exploration, workflow scheduling, dashboarding, and collaboration in a single-pane, web-based graphical user interface (GUI). Leveraged by data science, engineering, and operations teams across the company, DSW has quickly scaled to become Uber’s go-to data analytics solution. Current DSW use cases include pricing, safety, fraud detection, and navigation, among other foundational elements of the trip experi...

Operating a Large, Distributed System in a Reliable Way

Image
"The article is the collection of the practices I've found useful to reliably operate a large system at Uber, while working here. My experience is not unique - people working on similar sized systems go through a similar journey. I've talked with engineers at Google, Facebook, and Netflix, who shared similar experiences and solutions. Many of the ideas and processes listed here should apply to systems of similar scale, regardless of running on own data centers (like Uber mostly does) or on the cloud (where Uber sometimes scales to). However, the practices might be an overkill for smaller or less mission-critical systems." There's much ground to cover: Monitoring Oncall, Anomaly Detection & Alerting Outages & Incident Management Processes Postmortems, Incident Reviews & a Culture of Ongoing Improvements Failover Drills, Capacity Planning & Blackbox Testing SLOs, SLAs & Reporting on Them SRE as an Independent Team Reliability as an...

Deep dive into how Uber uses Spark

Image
Apache Spark is a foundational piece of Uber’s Big Data infrastructure that powers many critical aspects of our business. We currently run more than one hundred thousand Spark applications per day, across multiple different compute environments. Spark’s versatility, which allows us to build applications and run them everywhere that we need, makes this scale possible. However, our ever-growing infrastructure means that these environments are constantly changing, making it increasingly difficult for both new and existing users to give their applications reliable access to data sources, compute resources, and supporting tools. Also, as the number of users grow, it becomes more challenging for the data team to communicate these environmental changes to users, and for us to understand exactly how Spark is being used. We built the Uber Spark Compute Service (uSCS) to help manage the complexities of running Spark at this scale. This Spark-as-a-service solution leverages Apache Livy, cu...

Introducing Ludwig, a Code-Free Deep Learning Toolbox

Image
Over the last decade, deep learning models have proven highly effective at performing a wide variety of machine learning tasks in vision, speech, and language. At Uber we are using these models for a variety of tasks, including customer support, object detection, improving maps, streamlining chat communications, forecasting, and preventing fraud. Many open source libraries, including TensorFlow, PyTorch, CNTK, MXNET, and Chainer, among others, have implemented the building blocks needed to build such models, allowing for faster and less error-prone development. This, in turn, has propelled the adoption of such models both by the machine learning research community and by industry practitioners, resulting in fast progress in both architecture design and industrial solutions. At Uber AI, we decided to avoid reinventing the wheel and to develop packages built on top of the strong foundations open source libraries provide. To this end, in 2017 we released Pyro, a deep probabilistic program...