Data Structure Behind Amazon S3s Keys (Filtering Data Structure)
Posted
by dimo414
on Stack Overflow
See other posts from Stack Overflow
or by dimo414
Published on 2010-03-12T06:40:59Z
Indexed on
2010/03/12
6:47 UTC
Read the original article
Hit count: 490
I'd like to implement a data structure similar to the lookup functionality of Amazon S3. For those of you who don't know what I'm taking about, Amazon S3 stores all files at the root, but allows you to look up groups of files by common prefixes in their names, therefore replicating the power of a directory tree without the complexity of it.
The catch is, both lookup and filter operations are O(1) (or close enough that even on very large buckets - S3's disk equivalents - both operations might as well be O(1))).
So in short, I'm looking for a data structure that functions like a hash map, with the added benefit of efficient (at the very least not O(n)) filtering. The best I can come up with is extending HashMap so that it also contains a (sorted) list of contents, and doing a binary search for the range that matches the prefix, and returning that set. This seems slow to me, but I can't think of any other way to do it.
Does anyone know either how Amazon does it, or a better way to implement this data structure?
© Stack Overflow or respective owner