Hadoop: Processing large serialized objects
- by restrictedinfinity
I am working on development of an application to process (and merge) several large java serialized objects (size of order GBs) using Hadoop framework. Hadoop stores distributes blocks of a file on different hosts. But as deserialization will require the all the blocks to be present on single host, its gonna hit the performance drastically. How can I deal this situation where different blocks have to cant be individually processed, unlike text files ?