PIG doesn't read my custom InputFormat

Posted by Simon Guo on Stack Overflow See other posts from Stack Overflow or by Simon Guo
Published on 2012-12-18T23:00:43Z Indexed on 2012/12/18 23:03 UTC
Read the original article Hit count: 308

I have a custom MyInputFormat that suppose to deal with record boundary problem for multi-lined inputs. But when I put the MyInputFormat into my UDF load function. As follow:

public class EccUDFLogLoader extends LoadFunc {
    @Override
    public InputFormat getInputFormat() {
        System.out.println("I am in getInputFormat function");
        return new MyInputFormat();
    }
}

public class MyInputFormat extends TextInputFormat {
    public RecordReader createRecordReader(InputSplit inputSplit, JobConf jobConf) throws IOException {
        System.out.prinln("I am in createRecordReader");
        //MyRecordReader suppose to handle record boundary
        return new MyRecordReader((FileSplit)inputSplit, jobConf);
    }
}

For each mapper, it print out I am in getInputFormat function but not I am in createRecordReader. I am wondering if anyone can provide a hint on how to hoop up my costome MyInputFormat to PIG's UDF loader? Much Thanks.

I am using PIG on Amazon EMR.

© Stack Overflow or respective owner

Related posts about hadoop

Related posts about user-defined-functions