Having two sets of input combined on hadoop
- by aeolist
I have a rather simple hadoop question which i'll try to present with an example
say you have a list of strings and a large file and you want each mapper to process a piece of the file and one of the strings in a grep like program.
how are you supposed to do that? I am under the impression that the number of mappers is a result of the inputsplits produced. I could run subsequent jobs, one for each string, but it seems kinda... messy?