Distributed SQL System Review: Snowflake vs Splice Machine


After many years of Big Data, NoSQL, and Schema-on-Read detours, there is a clear return to SQL as the lingua franca for data operations. Developers need the comprehensive expressiveness that SQL provides. A world without SQL ignores more than 40 years of database research and results in hard-coded spaghetti code in applications to handle functionality that SQL handles extremely efficiently such as joins, groupings, aggregations, and (most importantly) rollback when updates go wrong.


Luckily, there is a modern architecture for SQL called Distributed SQL that no longer suffers from the challenges of traditional SQL systems (cost, scalability, performance, elasticity, and schema flexibility). The key attribute of Distributed SQL is that data is stored across many distributed storage locations and computation takes place across a cluster of networked servers. This yields unprecedented performance and scalability because it distributes work on each worker node in the cluster in parallel.

While Distributed SQL systems share many characteristics, they also have profound differences and some are better suited to certain workloads. Here, we try to compare Snowflake and Splice Machine as two examples of Distributed SQL systems that differ in significant ways.

Unfortunately, the comparative lines between data systems have blurred. For example, the mere fact that a system claims the ACID properties of a database (i.e., atomicity, consistency, isolation, and durability), does not necessarily mean it is truly a transactional OLTP system capable of powering applications (See this Medium article for more details on this topic). Another example is elasticity — as workloads scale, you can add more workers to gain more parallelism, or take away workers and reduce costs as workloads contract. Many systems are elastic but only some can automatically extend clusters to gain more (or less) concurrency or throughput. These are two example features of where Splice Machine and Snowflake differ from each other.

Here we will try to provide a balanced view of these systems, even though we represent one of them. Instead of a feature by feature view of how these systems differ, we will present a use-case perspective.

Below we will present two radically different use cases. One will be a clear fit for Splice Machine and one will be a clear fit for Snowflake.

Read full review >>>

Comments