Posts

Showing posts from April, 2018

Actian VectorH architecture and Amazon Redshift benchmark papers

Actian Vector in Hadoop (VectorH for short) is a new SQL-on-Hadoop  system  built  on  top  of  the  fast  Vectorwise  analytical database system.  VectorH achieves fault tolerance and storage scalability by relying on HDFS, and extends the state-of-the-art in SQL-on-Hadoop systems by instrumenting  the  HDFS  replication  policy  to  optimize  read  locality. VectorH integrates with YARN for workload management, achieving a high degree of elasticity.  Even though HDFS is an append-only filesystem, and VectorH supports (update-averse) ordered tables, trickle updates are possible thanks to Positional Delta Trees (PDTs), a differential update structure that can be queried efficiently.  The paper describes the changes made to single-server Vectorwise to turn it into a Hadoop-based  MPP  system,  encompassing  workload  management, parallel  query  optimizat...

Apache Kafka Architecture

Image
The Apache Kafka distributed streaming platform features an architecture that – ironically, given the name – provides application messaging that is markedly clearer and less Kafkaesque when compared with alternatives. In this article, we’ll take a detailed look at how Kafka’s architecture accomplishes this. https://insidebigdata.com/2018/04/12/developing-deeper-understanding-apache-kafka-architecture/

Netflix FlameScope

Image
Netflix has open-sourced its visualization tool for exploring variance, perturbations, single-threaded execution, application startup, and other time-based data as flame graphs. Details >>

Amazing Infographics and Other Visual Tutorials

Data Science Summarized in One Picture   R for Big Data in One Picture   A Cheat Sheet on Probability   Data Science in Python: Pandas Cheat Sheet   Cheat Sheet: Data Visualisation in Python   Machine Learning Cheat Sheet   The Periodic Table Of AI   Three Periodic Tables   40 maps that explain the Internet   A Guide to the Internet of Things   IoT Tectonics   13 Great Data Science Infographics   Unstructured Data: InfoGraphics   Great Machine Learning Infographics   What is Hadoop? Infog...

What Comes After Deep Learning

We’re stuck.  There hasn’t been a major breakthrough in algorithms in the last year.  Here’s a survey of the leading contenders for that next major advancement. Details: https://www.datasciencecentral.com/profiles/blogs/what-comes-after-deep-learning