Posts

Showing posts with the label YARN

Apache Hadoop 3.1- a Giant Leap for Big Data

Image
Use Cases When we are in the outdoors, many of us often feel the need for a camera- that is intelligent enough to follow us, adjust to the terrain heights and visually navigate through the obstacles, while capturing panoramic videos.  Here, I am talking about autonomous self-flying drones, very similar to cars on auto pilot. The difference is that we are starting to see proliferation of artificial intelligence into affordable, everyday use cases, compared to relatively expensive cars. These new use cases mean: (1) They will need parallel compute processing to crunch through insane amount of data (visual or otherwise) in real time for inferences and training of deep learning neural network algorithms. This helps them distinguish between objects and get better with more data. Think like a leap of compute processing by 100x, due to the real time nature of the use cases (2) They will need the deep learning software frameworks, so that data scientists & data engi...

The evolution of cluster scheduler architectures

Cluster schedulers are an important component of modern infrastructure, and have evolved significantly in the last few years. Their architecture has moved from monolithic designs to much more flexible, disaggregated and distributed designs. However, many current open-source offerings are either still monolithic, or otherwise lack key features. These features matter to real-world users, as they are required to achieve good utilization. Scheduling is an important topic because it directly affects the cost of operating a cluster: a poor scheduler results in low utilization, which costs money as expensive machines are left idle. High utilization, however, is not sufficient on its own: antagonistic workloads interfere with other workloads unless the decisions are made carefully. Details: http://www.firmament.io/blog/scheduler-architectures.html