Data Structure Behind Amazon S3s Keys (Filtering Data Structure)

Posted by dimo414 on Stack Overflow See other posts from Stack Overflow or by dimo414
Published on 2010-03-12T06:40:59Z Indexed on 2010/03/12 6:47 UTC
Read the original article Hit count: 486

I'd like to implement a data structure similar to the lookup functionality of Amazon S3. For those of you who don't know what I'm taking about, Amazon S3 stores all files at the root, but allows you to look up groups of files by common prefixes in their names, therefore replicating the power of a directory tree without the complexity of it.

The catch is, both lookup and filter operations are O(1) (or close enough that even on very large buckets - S3's disk equivalents - both operations might as well be O(1))).

So in short, I'm looking for a data structure that functions like a hash map, with the added benefit of efficient (at the very least not O(n)) filtering. The best I can come up with is extending HashMap so that it also contains a (sorted) list of contents, and doing a binary search for the range that matches the prefix, and returning that set. This seems slow to me, but I can't think of any other way to do it.

Does anyone know either how Amazon does it, or a better way to implement this data structure?

© Stack Overflow or respective owner

Related posts about java

Related posts about hashmap