Filtering null values with pig

Posted by arianp on Stack Overflow See other posts from Stack Overflow or by arianp
Published on 2012-10-31T18:26:41Z Indexed on 2012/11/17 5:01 UTC
Read the original article Hit count: 196

Filed under:
|
|

It looks like a silly problem, but I can´t find a way to filter null values from my rows. This is the result when I dump the object geoinfo:

DUMP geoinfo;
([longitude#70.95853,latitude#30.9773])
([longitude#-9.37944507,latitude#38.91780853])
(null)
(null)
(null)
([longitude#-92.64416,latitude#16.73326])
(null)
(null)
([longitude#-9.15199849,latitude#38.71179122])
([longitude#-9.15210796,latitude#38.71195131])

here is the description

DESCRIBE geoinfo;
geoinfo: {geoLocation: bytearray}

What I'm trying to do is to filter null values like this:

geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null;

but the result remains the same. nothing is filtered.

I also tried something like this

geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null';

and I got an error

org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a map to a String

What am I doing wrong?

details, running on ubuntu, hadoop-1.0.3 with pig 0.9.3

pig -version Apache Pig version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012, 19:04:03

java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

© Stack Overflow or respective owner

Related posts about hadoop

Related posts about pig