michael huang - Page 8 - Developer IT

ORDER BY job failed in the Pig script while running EmbeddedPig using Java

- by C.c. Huang

I have this following pig script, which works perfectly using grunt shell (stored the results to HDFS without any issues); however, the last job (ORDER BY) failed if I ran the same script using Java EmbeddedPig. If I replace the ORDER BY job by others, such as GROUP or FOREACH GENERATE, the whole script then succeeded in Java EmbeddedPig. So I think it's the ORDER BY which causes the issue. Anyone has any experience with this? Any help would be appreciated! The Pig script: REGISTER pig-udf-0.0.1-SNAPSHOT.jar; user_similarity = LOAD '/tmp/sample-sim-score-results-31/part-r-00000' USING PigStorage('\t') AS (user_id: chararray, sim_user_id: chararray, basic_sim_score: float, alt_sim_score: float); simplified_user_similarity = FOREACH user_similarity GENERATE $0 AS user_id, $1 AS sim_user_id, $2 AS sim_score; grouped_user_similarity = GROUP simplified_user_similarity BY user_id; ordered_user_similarity = FOREACH grouped_user_similarity { sorted = ORDER simplified_user_similarity BY sim_score DESC; top = LIMIT sorted 10; GENERATE group, top; }; top_influencers = FOREACH ordered_user_similarity GENERATE com.aol.grapevine.similarity.pig.udf.AssignPointsToTopInfluencer($1, 10); all_influence_scores = FOREACH top_influencers GENERATE FLATTEN($0); grouped_influence_scores = GROUP all_influence_scores BY bag_of_topSimUserTuples::user_id; influence_scores = FOREACH grouped_influence_scores GENERATE group AS user_id, SUM(all_influence_scores.bag_of_topSimUserTuples::points) AS influence_score; ordered_influence_scores = ORDER influence_scores BY influence_score DESC; STORE ordered_influence_scores INTO '/tmp/cc-test-results-1' USING PigStorage(); The error log from Pig: 12/04/05 10:00:56 INFO pigstats.ScriptState: Pig script settings are added to the job 12/04/05 10:00:56 INFO mapReduceLayer.JobControlCompiler: mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 12/04/05 10:00:58 INFO mapReduceLayer.JobControlCompiler: Setting up single store job 12/04/05 10:00:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 12/04/05 10:00:58 INFO mapReduceLayer.MapReduceLauncher: 1 map-reduce job(s) waiting for submission. 12/04/05 10:00:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/04/05 10:00:58 INFO input.FileInputFormat: Total input paths to process : 1 12/04/05 10:00:58 INFO util.MapRedUtil: Total input paths to process : 1 12/04/05 10:00:58 INFO util.MapRedUtil: Total input paths (combined) to process : 1 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating tmp-1546565755 in /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134-work-6955502337234509704 with rwxr-xr-x 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://localhost/tmp/temp1725960134/tmp-1546565755#pigsample_854728855_1333645258470 as /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://localhost/tmp/temp1725960134/tmp-1546565755#pigsample_854728855_1333645258470 as /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 12/04/05 10:00:58 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir. 12/04/05 10:00:58 INFO mapred.TaskRunner: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/pigsample_854728855_1333645258470 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.jar.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.jar.crc 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.split.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.split.crc 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.splitmetainfo.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.splitmetainfo.crc 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.xml.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.xml.crc 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.jar <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.jar 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.split <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.split 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.splitmetainfo <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.splitmetainfo 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.xml <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.xml 12/04/05 10:00:59 INFO mapred.Task: Using ResourceCalculatorPlugin : null 12/04/05 10:00:59 INFO mapred.MapTask: io.sort.mb = 100 12/04/05 10:00:59 INFO mapred.MapTask: data buffer = 79691776/99614720 12/04/05 10:00:59 INFO mapred.MapTask: record buffer = 262144/327680 12/04/05 10:00:59 WARN mapred.LocalJobRunner: job_local_0004 java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:560) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153) at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112) ... 6 more 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Deleted path /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 12/04/05 10:00:59 INFO mapReduceLayer.MapReduceLauncher: HadoopJobId: job_local_0004 12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: job job_local_0004 has failed! Stop running all dependent jobs 12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: 100% complete 12/04/05 10:01:04 ERROR pigstats.PigStatsUtil: 1 map reduce job(s) failed! 12/04/05 10:01:04 INFO pigstats.PigStats: Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2-cdh3u3 0.8.1-cdh3u3 cchuang 2012-04-05 10:00:34 2012-04-05 10:01:04 GROUP_BY,ORDER_BY Some jobs have failed! Stop running all dependent jobs Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_local_0001 0 0 0 0 0 0 0 0 all_influence_scores,grouped_user_similarity,simplified_user_similarity,user_similarity GROUP_BY job_local_0002 0 0 0 0 0 0 0 0 grouped_influence_scores,influence_scores GROUP_BY,COMBINER job_local_0003 0 0 0 0 0 0 0 0 ordered_influence_scores SAMPLER Failed Jobs: JobId Alias Feature Message Outputs job_local_0004 ordered_influence_scores ORDER_BY Message: Job failed! Error - NA /tmp/cc-test-results-1, Input(s): Successfully read 0 records from: "/tmp/sample-sim-score-results-31/part-r-00000" Output(s): Failed to produce result in "/tmp/cc-test-results-1" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local_0001 -> job_local_0002, job_local_0002 -> job_local_0003, job_local_0003 -> job_local_0004, job_local_0004 12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: Some jobs have failed! Stop running all dependent jobs

Search Results

Search found 2730 results on 110 pages for 'michael huang'.

Page 8/110 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >

- by C.c. Huang

- by Michael Konietzny

- by Michael Smith

- by Gloria Huang

- by Claire Huang

- by Michael Xu

- by Chris Huang-Leaver

- by Michael

- by Michael Wagner

- by Michael Xu

- by Francis Huang

- by Michael

- by Michael Robinson

- by Michael

- by Claire Huang

- by Mike.Huang

- by Claire Huang

- by Michael

- by Michael Frost

- by Claire Huang

- by Michael

- by vito huang

- by Claire Huang

- by Michael

- by Michael Stoll

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >