8 ways to replace HDFS

Hadoop is on its way to becoming the de facto platform for the next-generation of data-based applications, but it’s not without flaws. Ironically, one of Hadoop’s biggest shortcomings now is also one of its biggest strengths going forward — the Hadoop Distributed File System.

Within the Apache Software Foundation, HDFS is always improving in terms of performance and availability. Honestly, it’s probably fine for the majority of Hadoop workloads that are running in pilot projects, skunkworks projects or generally non-demanding environments. And technologies such as HBase that are built atop HDFS speak to its versatility as storage system even for non-MapReduce applications.


But if the growing number of options for replacing HDFS signifies anything, it’s that HDFS isn’t quite where it needs to be. Some Hadoop users have strict demands around performance, availability and enterprise-grade features, while others aren’t keen of its direct-attached storage (DAS) architecture. Concerns around availability might be especially valid for anyone (read “almost everyone”) who’s using an older version of Hadoop without the High Availability NameNode.


Here are eight products and projects whose proprietors argue can deliver what HDFS can’t:
  1. Cassandra (DataStax)
  2. Ceph
  3. Dispersed Storage Network (Cleversafe)
  4. GPFS (IBM)
  5. Isilon (EMC)
  6. Lustre
  7. MapR File System
  8. NetApp Open Solution for Hadoop
Details: https://gigaom.com/2012/07/11/because-hadoop-isnt-perfect-8-ways-to-replace-hdfs/

Comments