Remote I/O costs with a Content Delivery Network
- by x711Li
As far as I know, the time complexity of scanning a directory and the amount of files in said directory are correlated due to I/O costs. Would the administrative costs of placing the files in a hashed directory tree for uploading/downloading files through a CDN API be worth it for the added efficiency?
For instance, given a filename foo.mp3, the MD5 hash for this is 10ebb1120767e9de166e0f5905077cb1. Thus, storing foo.mp3 in ./10/eb/foo.mp3 would allow for less files per directory (assuming MD5 generates patterns with in Base36, this allows for 36^2 root directories with 36^2 subdirectories each and little chance of hash collision)
Considering the directories themselves are not loaded, would the I/O costs of directory scanning still exist with direct uploading/downloading?