Hadoop and Object Reuse, Why?
- by Andrew White
In Hadoop, objects passed to reducers are reused. This is extremely surprising and hard to track down if you're not expecting it. Furthermore, the original tracker for this "feature" doesn't offer any evidence that this change actually improved performance (unless I missed it).
It would speed up the system substantially if we reused the keys and values [...] but I think it is worth doing.
This seems completely counter to this very popular answer. Is there some credence to the Hadoop developer's claim? Is there something "special" about Hadoop that would invalidate the notion of object creation being cheap?