Distributed, Parallel, Fault-tolerant File System
- by Eddified
There are so many choices that it's hard to know where to start. My requirements are these:
Runs on Linux
Most of the files will be between 5-9 MB in size. There will also be a significant number of small-ish jpgs (100px x 100px).
All of the files need to be available over http.
Redundancy -- ideally it would provide the space efficiency similar to RAID 5 of 75% (in RAID 5 this would be calculated thus: with 4 identical disks, 25% of the space is used for parity = 75% efficent)
Must support several petabytes of data
scalable
runs on commodity hardware
In addition, I look for these qualities, though they are not "requirements":
Stable, mature file system
Lots of momentum and support
etc
I would like some input as to which file system works best for the given requirements. Some people at my organization are leaning towards MogileFS, but I'm not convinced of the stability and momentum of that project. GlusterFS and Lustre, based on my limited research, appear to be better supported...
Thoughts?