Search Results

Search found 6545 results on 262 pages for 'pig head'.

Page 1/262 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Big Data – Interacting with Hadoop – What is PIG? – What is PIG Latin? – Day 16 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the HIVE in Big Data Story. In this article we will understand what is PIG and PIG Latin in Big Data Story. Yahoo started working on Pig for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. What is Pig and What is Pig Latin? Pig is a high level platform for creating MapReduce programs used with Hadoop and the language we use for this platform is called PIG Latin. The pig was designed to make Hadoop more user-friendly and approachable by power-users and nondevelopers. PIG is an interactive execution environment supporting Pig Latin language. The language Pig Latin has supported loading and processing of input data with series of transforming to produce desired results. PIG has two different execution environments 1) Local Mode – In this case all the scripts run on a single machine. 2) Hadoop – In this case all the scripts run on Hadoop Cluster. Pig Latin vs SQL Pig essentially creates set of map and reduce jobs under the hoods. Due to same users does not have to now write, compile and build solution for Big Data. The pig is very similar to SQL in many ways. The Ping Latin language provide an abstraction layer over the data. It focuses on the data and not the structure under the hood. Pig Latin is a very powerful language and it can do various operations like loading and storing data, streaming data, filtering data as well various data operations related to strings. The major difference between SQL and Pig Latin is that PIG is procedural and SQL is declarative. In simpler words, Pig Latin is very similar to SQ Lexecution plan and that makes it much easier for programmers to build various processes. Whereas SQL handles trees naturally, Pig Latin follows directed acyclic graph (DAG). DAGs is used to model several different kinds of structures in mathematics and computer science. DAG Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Zookeeper. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Storing data to SequenceFile from Apache Pig

- by asquithea

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader: REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar; DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader(); log = LOAD '/data/logs' USING SequenceFileLoader AS (...) Is there also a library out there that would allow writing to Hadoop sequence files from Pig?

Read the article
ORDER BY job failed in the Pig script while running EmbeddedPig using Java

- by C.c. Huang

I have this following pig script, which works perfectly using grunt shell (stored the results to HDFS without any issues); however, the last job (ORDER BY) failed if I ran the same script using Java EmbeddedPig. If I replace the ORDER BY job by others, such as GROUP or FOREACH GENERATE, the whole script then succeeded in Java EmbeddedPig. So I think it's the ORDER BY which causes the issue. Anyone has any experience with this? Any help would be appreciated! The Pig script: REGISTER pig-udf-0.0.1-SNAPSHOT.jar; user_similarity = LOAD '/tmp/sample-sim-score-results-31/part-r-00000' USING PigStorage('\t') AS (user_id: chararray, sim_user_id: chararray, basic_sim_score: float, alt_sim_score: float); simplified_user_similarity = FOREACH user_similarity GENERATE $0 AS user_id, $1 AS sim_user_id, $2 AS sim_score; grouped_user_similarity = GROUP simplified_user_similarity BY user_id; ordered_user_similarity = FOREACH grouped_user_similarity { sorted = ORDER simplified_user_similarity BY sim_score DESC; top = LIMIT sorted 10; GENERATE group, top; }; top_influencers = FOREACH ordered_user_similarity GENERATE com.aol.grapevine.similarity.pig.udf.AssignPointsToTopInfluencer($1, 10); all_influence_scores = FOREACH top_influencers GENERATE FLATTEN($0); grouped_influence_scores = GROUP all_influence_scores BY bag_of_topSimUserTuples::user_id; influence_scores = FOREACH grouped_influence_scores GENERATE group AS user_id, SUM(all_influence_scores.bag_of_topSimUserTuples::points) AS influence_score; ordered_influence_scores = ORDER influence_scores BY influence_score DESC; STORE ordered_influence_scores INTO '/tmp/cc-test-results-1' USING PigStorage(); The error log from Pig: 12/04/05 10:00:56 INFO pigstats.ScriptState: Pig script settings are added to the job 12/04/05 10:00:56 INFO mapReduceLayer.JobControlCompiler: mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 12/04/05 10:00:58 INFO mapReduceLayer.JobControlCompiler: Setting up single store job 12/04/05 10:00:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 12/04/05 10:00:58 INFO mapReduceLayer.MapReduceLauncher: 1 map-reduce job(s) waiting for submission. 12/04/05 10:00:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/04/05 10:00:58 INFO input.FileInputFormat: Total input paths to process : 1 12/04/05 10:00:58 INFO util.MapRedUtil: Total input paths to process : 1 12/04/05 10:00:58 INFO util.MapRedUtil: Total input paths (combined) to process : 1 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating tmp-1546565755 in /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134-work-6955502337234509704 with rwxr-xr-x 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://localhost/tmp/temp1725960134/tmp-1546565755#pigsample_854728855_1333645258470 as /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://localhost/tmp/temp1725960134/tmp-1546565755#pigsample_854728855_1333645258470 as /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 12/04/05 10:00:58 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir. 12/04/05 10:00:58 INFO mapred.TaskRunner: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/pigsample_854728855_1333645258470 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.jar.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.jar.crc 12/04/05 10:00:58 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.split.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.split.crc 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.splitmetainfo.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.splitmetainfo.crc 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/.job.xml.crc <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/.job.xml.crc 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.jar <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.jar 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.split <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.split 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.splitmetainfo <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.splitmetainfo 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /var/lib/hadoop-0.20/cache/cchuang/mapred/staging/cchuang402164468/.staging/job_local_0004/job.xml <- /var/lib/hadoop-0.20/cache/cchuang/mapred/local/localRunner/job.xml 12/04/05 10:00:59 INFO mapred.Task: Using ResourceCalculatorPlugin : null 12/04/05 10:00:59 INFO mapred.MapTask: io.sort.mb = 100 12/04/05 10:00:59 INFO mapred.MapTask: data buffer = 79691776/99614720 12/04/05 10:00:59 INFO mapred.MapTask: record buffer = 262144/327680 12/04/05 10:00:59 WARN mapred.LocalJobRunner: job_local_0004 java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:560) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/cchuang/workspace/grapevine-rec/pigsample_854728855_1333645258470 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153) at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112) ... 6 more 12/04/05 10:00:59 INFO filecache.TrackerDistributedCacheManager: Deleted path /var/lib/hadoop-0.20/cache/cchuang/mapred/local/archive/4334795313006396107_361978491_57907159/localhost/tmp/temp1725960134/tmp-1546565755 12/04/05 10:00:59 INFO mapReduceLayer.MapReduceLauncher: HadoopJobId: job_local_0004 12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: job job_local_0004 has failed! Stop running all dependent jobs 12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: 100% complete 12/04/05 10:01:04 ERROR pigstats.PigStatsUtil: 1 map reduce job(s) failed! 12/04/05 10:01:04 INFO pigstats.PigStats: Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2-cdh3u3 0.8.1-cdh3u3 cchuang 2012-04-05 10:00:34 2012-04-05 10:01:04 GROUP_BY,ORDER_BY Some jobs have failed! Stop running all dependent jobs Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_local_0001 0 0 0 0 0 0 0 0 all_influence_scores,grouped_user_similarity,simplified_user_similarity,user_similarity GROUP_BY job_local_0002 0 0 0 0 0 0 0 0 grouped_influence_scores,influence_scores GROUP_BY,COMBINER job_local_0003 0 0 0 0 0 0 0 0 ordered_influence_scores SAMPLER Failed Jobs: JobId Alias Feature Message Outputs job_local_0004 ordered_influence_scores ORDER_BY Message: Job failed! Error - NA /tmp/cc-test-results-1, Input(s): Successfully read 0 records from: "/tmp/sample-sim-score-results-31/part-r-00000" Output(s): Failed to produce result in "/tmp/cc-test-results-1" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local_0001 -> job_local_0002, job_local_0002 -> job_local_0003, job_local_0003 -> job_local_0004, job_local_0004 12/04/05 10:01:04 INFO mapReduceLayer.MapReduceLauncher: Some jobs have failed! Stop running all dependent jobs

Read the article
Pig_Cassandra integration caused - ERROR 1070: Could not resolve CassandraStorage using imports:

- by Le Dude

I'm following basic Pig, Cassandra, Hadoop installation. Everything works just fine as a stand alone. No error. However when I tried to run the example file provided by Pig_cassandra example, I got this error. [root@localhost pig]# /opt/cassandra/apache-cassandra-1.1.6/examples/pig/bin/pig_cassandra -x local -x local /opt/cassandra/apache-cassandra-1.1.6/examples/pig/example-script.pig Using /opt/pig/pig-0.10.0/pig-0.10.0-withouthadoop.jar. 2012-10-24 21:14:58,551 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12 2012-10-24 21:14:58,552 [main] INFO org.apache.pig.Main - Logging error messages to: /opt/pig/pig_1351138498539.log 2012-10-24 21:14:59,004 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2012-10-24 21:14:59,472 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve CassandraStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] Details at logfile: /opt/pig/pig_1351138498539.log Here is the log file Pig Stack Trace --------------- ERROR 1070: Could not resolve CassandraStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Could not resolve CassandraStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540) at org.apache.pig.PigServer.registerQuery(PigServer.java:540) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:555) at org.apache.pig.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: Failed to parse: Cannot instantiate: CassandraStorage at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589) ... 14 more Caused by: java.lang.RuntimeException: Cannot instantiate: CassandraStorage at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:510) at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:791) at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:780) at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:4583) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3115) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1291) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175) ... 15 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve CassandraStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:495) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:507) ... 24 more ================================================================================ I googled around and got to this point from other stackoverflow user that identified the potential problem but not the solution. Cassandra and pig integration cause error during startup I believe my configuration is correct and the path has already been defined properly. I didn't change anything in the pig_cassandra file. I'm not quite sure how to proceed from here. Please help?

Read the article
Splitting input into substrings in PIG (Hadoop)

- by Niels Basjes

Assume I have the following input in Pig: some And I would like to convert that into: s so som some I've not (yet) found a way to iterate over a chararray in pig latin. I have found the TOKENIZE function but that splits on word boundries. So can "pig latin" do this or is this something that requires a Java class to do that?

Read the article
How can I rank teams based off of head to head wins/losses

- by TMP

I'm trying to write an algorithm (specifically in Ruby) that will rank teams based on their record against each other. If a team A and team B have won the same amount of games against each other, then it goes down to point differentials. Here's an example: A beat B two times B beats C one time A beats D three times C bests D two times D beats C one time B beats A one time Which sort of reduces to A[B] = 2 B[C] = 1 A[D] = 3 C[D] = 2 D[C] = 1 B[A] = 1 Which sort of reduces to A[B] = 1 B[C] = 1 A[D] = 3 C[D] = 1 D[C] = -1 B[A] = -1 Which is about how far I've got I think the results of this specific algorithm would be: A, B, C, D But I'm stuck on how to transition from my nested hash-like structure to the results. My psuedo-code is as follows (I can post my ruby code too if someone wants): For each game(g): hash[g.winner][g.loser] += 1 That leaves hash as the first reduction above hash2 = clone of hash For each key(winner), value(losers hash) in hash: For each key(loser), value(losses against winner): hash2[loser][winner] -= losses Which leaves hash2 as the second reduction Feel free to as me question or edit this to be more clear, I'm not sure of how to put it in a very eloquent way. Thanks!

Read the article
Filtering null values with pig

- by arianp

It looks like a silly problem, but I can´t find a way to filter null values from my rows. This is the result when I dump the object geoinfo: DUMP geoinfo; ([longitude#70.95853,latitude#30.9773]) ([longitude#-9.37944507,latitude#38.91780853]) (null) (null) (null) ([longitude#-92.64416,latitude#16.73326]) (null) (null) ([longitude#-9.15199849,latitude#38.71179122]) ([longitude#-9.15210796,latitude#38.71195131]) here is the description DESCRIBE geoinfo; geoinfo: {geoLocation: bytearray} What I'm trying to do is to filter null values like this: geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null; but the result remains the same. nothing is filtered. I also tried something like this geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null'; and I got an error org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a map to a String What am I doing wrong? details, running on ubuntu, hadoop-1.0.3 with pig 0.9.3 pig -version Apache Pig version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012, 19:04:03 java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Read the article
Pig: Count number of keys in a map

- by Donald Miner

I'd like to count the number of keys in a map in Pig. I could write a UDF to do this, but I was hoping there would be an easier way. data = LOAD 'hbase://MARS1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'A:*', '-loadKey true -caching=100000') AS (id:bytearray, A_map:map[]); In the code above, I want to basically build a histogram of id and how many items in column family A that key has. In hoping, I tried c = FOREACH data GENERATE id, COUNT(A_map); but that unsurprisingly didn't work. Or, perhaps someone can suggest a better way to do this entirely. If I can't figure this out soon I'll just write a Java MapReduce job or a Pig UDF.

Read the article
PIG doesn't read my custom InputFormat

- by Simon Guo

I have a custom MyInputFormat that suppose to deal with record boundary problem for multi-lined inputs. But when I put the MyInputFormat into my UDF load function. As follow: public class EccUDFLogLoader extends LoadFunc { @Override public InputFormat getInputFormat() { System.out.println("I am in getInputFormat function"); return new MyInputFormat(); } } public class MyInputFormat extends TextInputFormat { public RecordReader createRecordReader(InputSplit inputSplit, JobConf jobConf) throws IOException { System.out.prinln("I am in createRecordReader"); //MyRecordReader suppose to handle record boundary return new MyRecordReader((FileSplit)inputSplit, jobConf); } } For each mapper, it print out I am in getInputFormat function but not I am in createRecordReader. I am wondering if anyone can provide a hint on how to hoop up my costome MyInputFormat to PIG's UDF loader? Much Thanks. I am using PIG on Amazon EMR.

Read the article
Configuring correct port for Oozie (invoking PIG script) in Cloudera Hue

- by user2985324

I am new to CDH4 Oozie workflow editor. While trying to invoke a pig script from Oozie workflow editor, i am getting the following error. HadoopAccessorException: E0900: Jobtracker [mymachine:8032] not allowed, not in Oozies whitelist It looks like Oozie is submitting the job to Yarn port (8032). I want it to submit to 8021 (MR jobtracker) port. Can someone help me in identify where to set the job tracker URL or port so that oozie picks up the correct one (using Hue or Cloudera manager). Previously I tried the following but none of them helped Modfied workflow.xml file /user/hue/oozie/workspaces/../workflow.xml file. However it gets overwritten when I submit the job from workflow editor. In cloudera Manager -- oozie -- configuration --Oozie Server (advanced) -- Oozie Server Configuration Safety Valve for oozie-site.xml property I set the following- <property> <name>oozie.service.HadoopAccessorService.nameNode.whitelist</name> <value>mymachine:8020</value> oozie.service.HadoopAccessorService.jobTracker.whitelist mymachine:8021 and restarted the oozie service. 3. Tried to override 'jobTracker' property while configuring the pig task. This appears as follows in the workflow file however it doesn't take effect (or doesn't override) and still uses 8032 port. <global> <configuration> <property> <name>jobTracker</name> <value>mymachine:8021</value> </property> </configuration> </global> I am using CDH4 version. Thanks for looking into my question.

Read the article
Hadoop/Pig Cross-join

- by sagie

Hi I am using Pig to cross join two data sets, both with format. This will result as . In my case, if I have both tuples and , it is a duplication. Can I filter those duplications (or not joining them at all)? thanks, Sagie

Read the article
How to solve "Broken Pipe" error when using awk with head

- by Jon

I'm getting broken pipe errors from a command that does something like: ls -tr1 /a/path | awk -F '\n' -vpath=/prepend/path/ '{print path$1}' | head -n 50 Essentially I want to list (with absolute path) the oldest X files in a directory. What seems to happen is that the output is correct (I get 50 file paths output) but that when head has output the 50 files it closes stdin causing awk to throw a broken pipe error as it is still outputting more rows.

Read the article
HTTP HEAD Request and System.Web.Mvc.FileResult

- by mnero0429

I'm using BITS to make requests to a ASP.NET MVC controller method named Source that returns a FileResult. I know the type FilePathResult uses HttpResponse.TransmitFile, but I don't know if HttpResponse.TransmitFile actually writes the file to the response stream regardless of the request type. My question is, does FileResult only include the header information on HEAD requests, or does it transmit the file regardless of the request type? Or, do I have to account for HEAD requests myself?

Read the article
What is the HEAD in git?

- by e-satis

There seems to be a difference between the last commit, the HEAD and the state of the file I can see in my directory. What is HEAD, what can I do with it and what mistake should I avoid?

Read the article
jQuery Can't $(...).load() the head title in Chrome

- by Matrym

I need to get the title of a remote page by URL. The code works in FFX, but not chrome. Anyone have any ideas? $(document).ready(function(){ $("title").remove(); $("head").load("http://www.latentmotion.com title"); });

Read the article
Rails: periodically use HEAD to check for page changes

- by Chris

I have a Rails app implementing a game, so it's expected that a player will leave a browser open at the game page. When player Alan takes an action himself, I use AJAX requests to update the game page Alan is viewing to reflect the new state. However, when another player (Bob) takes an action, I don't have (or want) a mechanism to push the change to Alan's view. I would like Alan's page to periodically poll the Rails server to find if there have been any changes since last reload, and to reload the page (either via a whole-page GET or an AJAX call) if not. In order to play nicely with caches and proxies, I'd like to do this by issuing a periodic HTTP HEAD request, get Rails to work out the timestamp of the last change (trivially available from my DB) and respond with that in the Last-Modified header; then have the client-side act on that timestamp. How can I go about doing this?

Read the article
Using jquery append() in <head> (in a function/class)

- by Anonymous

I want to use the append() function from inside the <head>, in a function to be specific, like so: function custom_img(src_img) { $("div").append("<img src='"+src_img+"'>"); } var myimg = new custom_img("test.jpg"); This is a quick example that I just wrote out. I want my function to make a new image like that every time I create a new object like this. Obviously this doesn't work, since the append() requires to be in the body (I've tried this). How would I do this?

Read the article
Does throwing an exception in an EvalFunc pig UDF skip just that line, or stop completely?

- by Daniel Huckstep

I have a User Defined Function (UDF) written in Java to parse lines in a log file and return information back to pig, so it can do all the processing. It looks something like this: public abstract class Foo extends EvalFunc<Tuple> { public Foo() { super(); } public Tuple exec(Tuple input) throws IOException { try { // do stuff with input } catch (Exception e) { throw WrappedIOException.wrap("Error with line", e); } } } My question is: if it throws the IOException, will it stop completely, or will it return results for the rest of the lines that don't throw an exception? Example: I run this in pig REGISTER myjar.jar DEFINE Extractor com.namespace.Extractor(); logs = LOAD '$IN' USING TextLoader AS (line: chararray); events = FOREACH logs GENERATE FLATTEN(Extractor(line)); With this input: 1.5 7 "Valid Line" 1.3 gghyhtt Inv"alid line"" I throw an exceptioN!! 1.8 10 "Valid Line 2" Will it process the two lines and will 'logs' have 2 tuples, or will it just die in a fire?

Read the article
Inline code within HEAD section of ASP Web Form

- by geekrutherford

Today I needed to include inline code within the HEAD section of an Web Form in order to dynamically include stylesheets based on the theme set for the application. Below is an example: <link href="../../Resources/Themes/<% = Page.Theme %>Grid.Default.css" rel="stylesheet" type="text/css"> Upon saving and viewing the page I noted that the code was not being interpreted and instead was being treated as a string literal. How to get around it? Add a panel control in the HEAD section and place the links to the stylesheets as in the example above within the panel. For whatever reason, ASP.NET does not want to interpret inline code in the HEAD section but will allow you to add .NET controls in the HEAD section.

Read the article
On a queue, which end is the "head"?

- by Aidan Cully

I had always thought that the "head" of a queue as the next element to be read, and never really questioned that usage. So a linked-list library I wrote, which is used for maintaining queues, codified that terminology: we have a list1_head macro that retrieves the first element; when using this library in a queue, this will be the first element to be removed. But a new developer on the team was used to having queues implemented the other way around. He described a queue as behaving like a dog: you insert at the head, and remove at the tail. This is a clever enough description that I feel like his usage must be more widespread, and I don't have a similarly evocative description of my preferred usage. So, I guess, there are two related questions: 1, what does the "head" of a queue mean to you? and 2, why do we use the word "head" to describe that concept?

Read the article
How to use Cassandra's Map Reduce with or w/o Pig?

- by UltimateBrent

Can someone explain how MapReduce works with Cassandra .6? I've read through the word count example, but I don't quite follow what's happening on the Cassandra end vs. the "client" end. https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/ For instance, let's say I'm using Python and Pycassa, how would I load in a new map reduce function, and then call it? Does my map reduce function have to be java that's installed on the cassandra server? If so, how do I call it from Pycassa? There's also mention of Pig making this all easier, but I'm a complete Hadoop noob, so that didn't really help. Your answer can use Thrift or whatever, I just mentioned Pycassa to denote the client side. I'm just trying to understand the difference between what runs in the Cassandra cluster vs. the actual server making the requests.

Read the article
Using PIG with Hadoop, how do I regex match parts of text with an unknown number of groups?

- by lmonson

I'm using Amazon's elastic map reduce. I have log files that look something like this random text foo="1" more random text foo="2" more text noise foo="1" blah blah blah foo="1" blah blah foo="3" blah blah foo="4" ... How can I write a pig expression to pick out all the numbers in the 'foo' expressions? I prefer tuples that look something like this: (1,2) (1) (1,3,4) I've tried the following: TUPLES = foreach LINES generate FLATTEN(EXTRACT(line,'foo="([0-9]+)"')); But this yields only the first match in each line: (1) (1) (1)

Read the article
Lots of http HEAD requests originating from porn sites

- by Don Corley

My access log on my web server has a ton of http HEAD requests coming from porn sites. What are HEAD requests and are they doing something bad with my site? Here is an excerpt from my log: (valid request) 96.251.177.249 - - [02/Jan/2011:23:42:25 -0800] "POST /ajax HTTP/1.1" 200 0 "http://www.mywebsite.com/abc.html" "Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7" 80.153.114.208 - - [02/Jan/2011:23:43:11 -0800] "HEAD / HTTP/1.0" 302 185 "http://www.somepornsite.com" "Mozilla/5.0 (X11; U; Linux i686; it-IT; rv:1.9.0.2) Gecko/2008092313 Ubuntu/9.25 (jaunty) Firefox/3.8" 80.153.114.208 - - [02/Jan/2011:23:43:11 -0800] "HEAD /tourappxsl HTTP/1.0" 200 16871 "http://www.somepornsite.com" "Mozilla/5.0 (X11; U; Linux i686; it-IT; rv:1.9.0.2) Gecko/2008092313 Ubuntu/9.25 (jaunty) Firefox/3.8" I changed only the web addresses in this log. Thanks for any ideas, Don

Read the article
Google crawler not found an error inside of the <head> tag

- by inckka

I've found a crawler error in my site and it is listed as a page not found(404) link. Heres the broken link http://mydomain.com/blog/comments/feed/ I'm using Google web master tools and found that broken link coming from my web site pages' head tag. here's actual code where that link situated. <head> <link rel="alternate" type="application/rss+xml" title="My Domain Blog » Feed" href="http://www.my-domain.com/blog/feed/" /> </head> So Google report this link as a not found. Actually this link target is not an exact page or a location. But essential for the blog feeds. Anyway I have to fix this and remove from the Google crawler error's list. But haven't got any idea, because cannot redirect or do a 404 header with this link target. Have anyone got an idea of fixing this?

Read the article
Best triple head display setup

- by dgel

I'm currently running Ubuntu 12.04 with a darn good triple head display setup. I've got a VisionTek 900530 Radeon HD 5450 512MB DDR3 PCI Express video card that has two DVI outputs and one Mini DisplayPort that I have connected to a HDMI adapter. I'm running three identical Asus 1920x1080 monitors that each have a DVI, VGA, and HDMI input. I'm using the xorg-edgers ppa, so I'm using the open source radeon driver version 6.99.99. I tried using the ATI binary fglrx driver, but I wasn't able to get the three monitors working properly- the monitor connected via HDMI / DisplayPort wouldn't run at full resolution. The setup is almost perfect: Compiz runs fine and is quite snappy. I'm not able to use that great compiz feature where you can drag a window to the side of a display and it will half maximize. I occasionally experience display corruption weirdness with Unity and need to restart it. When I use a dropdown menu in LibreOffice it often pops the menu down in another window. For example, if I'm using the center monitor and click the Insert menu, the menu pulls down on the monitor to my right, forcing me to chase it. If I chase down the menu and choose Manual Break, the dialog appears over on my left monitor. This absurdity is mildly entertaining but has lost its novelty. I've decided to build a new system and have spared no expense- latest i7 processor, SSD, etc. I really like the performance of the Nvidia binary drivers, so I put two ZOTAC ZT-40707-10L GeForce GT 440 in the system, figuring I'd have four DVI outputs and an awesome triple (or even eventually quad) head setup. Unfortunately it appears that I didn't do sufficient research before my purchase. It seems that Nvidia TwinView only supports two monitors on one card (I guess that's why they call it TwinView...). I messed around with running two X servers, but I really don't want that- being able to drag windows to any monitor is critical. It doesn't sound like Xinerama is an option because from what I understand it simply doesn't support Compiz. I've seen a BaseMosaic option that can be used with the Nvidia drivers that appears to support an almost unlimited number of heads- unfortunately me cheap little cards don't support it. I'm also not sure whether you'll still have all nice maximizing and snapping that TwinView provides, or whether Ubuntu will only see it as one massive display. I put my old trusty ATI card into my new system and installed 12.10. I'm using the opensource radeon drivers again because even in 12.10 I can't get the fglrx binary drivers to do triple head. Unfortunately, even with an unbelievably powerful system the experience is extremely sluggish (much more so than my experience in 12.04). The menu scattering problem appears to be fixed, but I get a lot of nasty Unity display corruption. So finally, my question is this: What hardware / drivers should I use? I'm willing to buy (almost) any video card(s). I have two PCI-Express 3.0 slots on my motherboard (which has an integrated Intel HD card). I'm willing to use ATI or Nvidia cards and willing to run Ubuntu 12.04.1 or 12.10. I'm not a gamer, but do want beautiful and snappy Compiz effects. Does anyone out there have the perfect triple head setup in 12.04 or 12.10? What hardware / drivers are you using? I have those two Nvidia cards but will probably be returning them unless someone knows a way to use them together for a triple head setup. Since I'm having pretty good luck with a single ATI card providing three displays, should I just buy a beefier one with the hopes that it will fix the horrible sluggishness I'm experiencing in 12.10?

Read the article

Search Results

Search found 6545 results on 262 pages for 'pig head'.

Page 1/262 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Pinal Dave

- by asquithea

- by C.c. Huang

- by Le Dude

- by Niels Basjes

- by TMP

- by arianp

- by Donald Miner

- by Simon Guo

- by user2985324

- by sagie

- by Jon

- by mnero0429

- by e-satis

- by Matrym

- by Chris

- by Anonymous

- by Daniel Huckstep

- by geekrutherford

- by Aidan Cully

- by UltimateBrent

- by lmonson

- by Don Corley

- by inckka

- by dgel

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >