Performance associated with storing millions of files on NTFS
- by Tim Brigham
Does anyone have a method / formula, etc that I could use - hopefully based on both current and projected numbers of files - to project the 'right' length of the split and the number of nested folders?
Please note that although similar it isn't quite the same as Storing a million images in the filesystem. I'm looking for a way to help make the theories outlined more generic.
Assumptions
I have 'some' initial number of files. This number would be arbitrary but large. Say 500k to 10m+.
I have considered the underlying physical hardware disk IO requirements that would be necessary to support such an endeavor.
Put another way
As time progresses this store will grow. I want to have the best balance of current performance and as my needs increase. Say I double or triple my storage. I need to be able to address both current needs and projected future growth. I need to both plan ahead and not sacrifice too much of current performance.
What I've come up with
I'm already thinking about using a hash split every so many characters to split things out across multiple directories and keeping the trees even, very similar as outlined in the comments in the question above. It also avoids duplicate files, which would be critical over time.
I'm sure that the initial folder structure would be different based on what I've outlined, and depending on the initial scale. As far as I can figure there isn't a one size fits all solution here. It would be horrendously time intensive to work something out experimentally.