Hadoop: Processing large serialized objects
Posted
by restrictedinfinity
on Stack Overflow
See other posts from Stack Overflow
or by restrictedinfinity
Published on 2010-06-10T06:28:22Z
Indexed on
2010/06/10
6:32 UTC
Read the original article
Hit count: 258
I am working on development of an application to process (and merge) several large java serialized objects (size of order GBs) using Hadoop framework. Hadoop stores distributes blocks of a file on different hosts. But as deserialization will require the all the blocks to be present on single host, its gonna hit the performance drastically. How can I deal this situation where different blocks have to cant be individually processed, unlike text files ?
© Stack Overflow or respective owner