Actian VectorH architecture and Amazon Redshift benchmark papers
Actian Vector in Hadoop (VectorH for short) is a new SQL-on-Hadoop system built on top of the fast Vectorwise analytical database system. VectorH achieves fault tolerance and storage scalability by relying on HDFS, and extends the state-of-the-art in SQL-on-Hadoop systems by instrumenting the HDFS replication policy to optimize read locality. VectorH integrates with YARN for workload management, achieving a high degree of elasticity. Even though HDFS is an append-only filesystem, and VectorH supports (update-averse) ordered tables, trickle updates are possible thanks to Positional Delta Trees (PDTs), a differential update structure that can be queried efficiently. The paper describes the changes made to single-server Vectorwise to turn it into a Hadoop-based MPP system, encompassing workload management, parallel query optimizat...