Posts

Showing posts from January, 2019

Ray: Application-level scheduling with custom resources

Image
Ray intends to be a universal framework for a wide range of machine learning applications. This includes distributed training, machine learning inference, data processing, latency-sensitive applications, and throughput-oriented applications. Each of these applications has different, and, at times, conflicting requirements for resource management. Ray intends to cater to all of them, as the newly emerging microkernel for distributed machine learning. In order to achieve that kind of generality, Ray enables explicit developer control with respect to the task and actor placement by using custom resources. In this blog post we are going to talk about use cases and provide examples. This article is intended for readers already familiar with Ray. If you are new to Ray are are looking to easily and elegantly parallelize your Python code, please take a look at this tutorial.  USE CASES Load Balancing.  In many cases, the preferred behavior is to distribute tasks across all available nod

The Top Tech Skills of 2018

Image
The Top Tech Skills of 2018: Kotlin & Kubernetes Made Their Mark Original Article >>>

Apache Superset in the Production Environment

Image
Visualizing data helps in building a much deeper understanding of the data and quickens analytics around the data. There are several mature paid products available on the market. Recently, I explored an open source product name Apache Superset which I found a very upbeat product in this space. Some prominent features of Superset are: A rich set of data visualizations. An easy-to-use interface for exploring and visualizing data. Create and share dashboards. After reading about Superset, I wanted to try it, and as Superset is a Python programming language-based project we can easily install it using pip; but I decided to set it up as a container based on Docker. The Apache Superset GitHub Repo contains code for building and running Superset as a container. Since I want to run Superset in a completely distributed manner and with as little modification as possible in the code, I decided to modify the code so that it could run in multiple different modes. Below is a list of sp

TensorFlow Privacy - training machine learning models with privacy for training data

Google has released  TensorFlow Privacy , a free Python library that lets people train TensorFlow models compliant with more stringent user data privacy standards. It uses differential privacy, a technique for training machine learning systems that increases user privacy by letting developers set various trade-offs relating to the amount of noise applied to the user data being processed. This repository contains the source code for TensorFlow Privacy, a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. The library comes with tutorials and analysis tools for computing the privacy guarantees provided. The TensorFlow Privacy library is under continual development, always welcoming contributions. In particular, we always welcome help towards resolving the issues currently open.

The 6 most useful Machine Learning projects of the past year (2018)

Let’s take a look at the top 6 most practically useful ML projects over the past year. These projects have published code and datasets that allow individual developers and smaller teams to learn and immediately create value. They may not be the most theoretically ground breaking works, but they are applicable and practical. Fast.ai The Fast.ai library was written to simplify training fast and accurate neural nets using modern best practices. It abstracts away all of the nitty gritty work that can come with implementing deep neural networks in practice. It’s very easy to use and is designed with a practitioner's application building mindset. Originally created for the students of the Fast.ai course, the library is written on top of the easy to use Pytorch library in a clean and concise way. Their  documentation  is top notch too. Detectron Detectron is Facebook AI’s research platform for object detection and instance segmentation research, written in Caffe FastText Anoth

Real-Time Stock Processing With Apache NiFi and Apache Kafka

Image
Implementing Streaming Use Case From REST to Hive With Apache NiFi and Apache Kafka Part 1 With Apache Kafka 2.0 and Apache NiFi 1.8, there are many new features and abilities coming out. It's time to put them to the test. So to plan out what we are going to do, I have a high-level architecture diagram. We are going to ingest a number of sources including REST feeds, Social Feeds, Messages, Images, Documents, and Relational Data. We will ingest with NiFi and then filter, process, and segment it into Kafka topics. Kafka data will be in Apache Avro format with schemas specified in the Hortonworks Schema Registry. Spark and NiFi will do additional event processing along with machine learning and deep learning. This will be stored in Druid for real-time analytics and summaries. Hive, HDFS, and S3 will store the data for permanent storage. We will do dashboards with Superset and Spark SQL + Zeppelin. We will also push back cleaned and aggregated data to subscribers via Kafka