HDFS scalability: the limits to growth
Some time ago I came across very interesting article by Konstantin V. Shvachko (now Senior Staff Software Engineer at LinkedIn) concerning the limits of hadoop scalability.
The main conclusion of it is that "a 10,000 node HDFS cluster with a single name-node is expected to handle well a workload of 100,000 readers, but even 10,000 writers can produce enough workload to saturate the name-node, making it a bottleneck for linear scaling.
Such a large difference in performance is attributed to get block locations (read workload) being a memory-only operation, while creates (write workload) require journaling, which is bounded by the local hard drive performance.
There are ways to improve the single name-node performance, but any solution intended for single namespace server optimization lacks scalability."
Konstantin continues:
"The most promising solutions seem to be based on distributing the namespace server itself both for workload balancing and for reducing the single server memory footprint. There are just a few distributed file systems that implement such an approach.
The most promising solutions seem to be based on distributing the namespace server itself both for workload balancing and for reducing the single server memory footprint. There are just a few distributed file systems that implement such an approach."
There are 4 such filesystems, which I find worth mentionning:
1. Ceph - http://ceph.com - 12.2.1 "Luminous" stable release - unified, distributed storage system;
2. Lustre - http://lustre.org - 2.10.0 (latest major release) - open-source, parallel file system;
3. Google File System which has the successor - Colossus;
4. Giraffa - distributed, highly scalable file system - https://github.com/GiraffaFS/giraffa/wiki/Introduction-and-requirements-for-Giraffa
Details: https://www.usenix.org/system/files/login/articles/1908-shvachko.pdf
More on Giraffa:
Scaling Namespace Operations with Giraffa File System
http://home.apache.org/~shv/docs/login_summer17_05_shvachko.pdf
The main conclusion of it is that "a 10,000 node HDFS cluster with a single name-node is expected to handle well a workload of 100,000 readers, but even 10,000 writers can produce enough workload to saturate the name-node, making it a bottleneck for linear scaling.
Such a large difference in performance is attributed to get block locations (read workload) being a memory-only operation, while creates (write workload) require journaling, which is bounded by the local hard drive performance.
There are ways to improve the single name-node performance, but any solution intended for single namespace server optimization lacks scalability."
Konstantin continues:
"The most promising solutions seem to be based on distributing the namespace server itself both for workload balancing and for reducing the single server memory footprint. There are just a few distributed file systems that implement such an approach.
The most promising solutions seem to be based on distributing the namespace server itself both for workload balancing and for reducing the single server memory footprint. There are just a few distributed file systems that implement such an approach."
There are 4 such filesystems, which I find worth mentionning:
1. Ceph - http://ceph.com - 12.2.1 "Luminous" stable release - unified, distributed storage system;
2. Lustre - http://lustre.org - 2.10.0 (latest major release) - open-source, parallel file system;
3. Google File System which has the successor - Colossus;
4. Giraffa - distributed, highly scalable file system - https://github.com/GiraffaFS/giraffa/wiki/Introduction-and-requirements-for-Giraffa
Details: https://www.usenix.org/system/files/login/articles/1908-shvachko.pdf
More on Giraffa:
Scaling Namespace Operations with Giraffa File System
http://home.apache.org/~shv/docs/login_summer17_05_shvachko.pdf
Comments
Post a Comment