Actian VectorH architecture and Amazon Redshift benchmark papers

Actian Vector in Hadoop (VectorH for short) is a new SQL-on-Hadoop  system  built  on  top  of  the  fast  Vectorwise  analytical database system.  VectorH achieves fault tolerance and storage scalability by relying on HDFS, and extends the state-of-the-art in SQL-on-Hadoop systems by instrumenting  the  HDFS  replication  policy  to  optimize  read  locality.
VectorH integrates with YARN for workload management, achieving a high degree of elasticity.  Even though HDFS is an append-only filesystem, and VectorH supports (update-averse) ordered tables, trickle updates are possible thanks to Positional Delta Trees (PDTs), a differential update structure that can be queried efficiently.  The paper describes the changes made to single-server Vectorwise to turn it into a Hadoop-based  MPP  system,  encompassing  workload  management, parallel  query  optimization  and  execution,  HDFS  storage, transaction processing and Spark integration.  The paper evaluates VectorH against HAWQ, Impala, SparkSQL and Hive, showing orders of magnitude better performance.


Actian VectorH architecture
https://event.cwi.nl/lsde/papers/vortex-sigmod2016.pdf

Amazon Redshift benchmark
https://www.actian.com/wp-content/uploads/2018/02/Actian-Vector-vs-Redshift-Benchmark-Report-Feb-2018.pdf

Comments