Posts

Showing posts from April, 2018

Actian VectorH architecture and Amazon Redshift benchmark papers

Actian Vector in Hadoop (VectorH for short) is a new SQL-on-Hadoop  system  built  on  top  of  the  fast  Vectorwise  analytical database system.  VectorH achieves fault tolerance and storage scalability by relying on HDFS, and extends the state-of-the-art in SQL-on-Hadoop systems by instrumenting  the  HDFS  replication  policy  to  optimize  read  locality. VectorH integrates with YARN for workload management, achieving a high degree of elasticity.  Even though HDFS is an append-only filesystem, and VectorH supports (update-averse) ordered tables, trickle updates are possible thanks to Positional Delta Trees (PDTs), a differential update structure that can be queried efficiently.  The paper describes the changes made to single-server Vectorwise to turn it into a Hadoop-based  MPP  system,  encompassing  workload  management, parallel  query  optimization  and  execution,  HDFS  storage, transaction processing and Spark integration.  The paper evaluates VectorH against HAWQ, Impala,

Apache Kafka Architecture

Image
The Apache Kafka distributed streaming platform features an architecture that – ironically, given the name – provides application messaging that is markedly clearer and less Kafkaesque when compared with alternatives. In this article, we’ll take a detailed look at how Kafka’s architecture accomplishes this. https://insidebigdata.com/2018/04/12/developing-deeper-understanding-apache-kafka-architecture/

Netflix FlameScope

Image
Netflix has open-sourced its visualization tool for exploring variance, perturbations, single-threaded execution, application startup, and other time-based data as flame graphs. Details >>

Amazing Infographics and Other Visual Tutorials

Data Science Summarized in One Picture   R for Big Data in One Picture   A Cheat Sheet on Probability   Data Science in Python: Pandas Cheat Sheet   Cheat Sheet: Data Visualisation in Python   Machine Learning Cheat Sheet   The Periodic Table Of AI   Three Periodic Tables   40 maps that explain the Internet   A Guide to the Internet of Things   IoT Tectonics   13 Great Data Science Infographics   Unstructured Data: InfoGraphics   Great Machine Learning Infographics   What is Hadoop? Infographics Explains How it Works  

What Comes After Deep Learning

We’re stuck.  There hasn’t been a major breakthrough in algorithms in the last year.  Here’s a survey of the leading contenders for that next major advancement. Details: https://www.datasciencecentral.com/profiles/blogs/what-comes-after-deep-learning