PIG doesn't read my custom InputFormat
- by Simon Guo
I have a custom MyInputFormat that suppose to deal with record boundary problem for multi-lined inputs. But when I put the MyInputFormat into my UDF load function. As follow:
public class EccUDFLogLoader extends LoadFunc {
@Override
public InputFormat getInputFormat() {
System.out.println("I am in getInputFormat function");
return new MyInputFormat();
}
}
public class MyInputFormat extends TextInputFormat {
public RecordReader createRecordReader(InputSplit inputSplit, JobConf jobConf) throws IOException {
System.out.prinln("I am in createRecordReader");
//MyRecordReader suppose to handle record boundary
return new MyRecordReader((FileSplit)inputSplit, jobConf);
}
}
For each mapper, it print out I am in getInputFormat function but not I am in createRecordReader. I am wondering if anyone can provide a hint on how to hoop up my costome MyInputFormat to PIG's UDF loader? Much Thanks.
I am using PIG on Amazon EMR.