Search Results

Search found 196 results on 8 pages for 'mapreduce'.

Page 4/8 | < Previous Page | 1 2 3 4 5 6 7 8  | Next Page >

  • Reading Data from DDFS ValueError: No JSON object could be decoded

    - by secumind
    I'm running dozens of map reduce jobs for a number of different purposes using disco. My data has grown enormous and I thought I would try using DDFS for a change rather than standard txt files. I've followed the DISCO map/reduce example Counting Words as a map/reduce job, without to much difficulty and with the help of others, Reading JSON specific data into DISCO I've gotten past one of my latest problems. I'm trying to read data in/out of ddfs to better chunk and distribute it but am having a bit of trouble. Here's an example file: file.txt {"favorited": false, "in_reply_to_user_id": null, "contributors": null, "truncated": false, "text": "I'll call him back tomorrow I guess", "created_at": "Mon Feb 13 05:34:27 +0000 2012", "retweeted": false, "in_reply_to_status_id_str": null, "coordinates": null, "in_reply_to_user_id_str": null, "entities": {"user_mentions": [], "hashtags": [], "urls": []}, "in_reply_to_status_id": null, "id_str": "168931016843603968", "place": null, "user": {"follow_request_sent": null, "profile_use_background_image": true, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/305726905/FASHION-3.png", "verified": false, "profile_image_url_https": "https://si0.twimg.com/profile_images/1818996723/image_normal.jpg", "profile_sidebar_fill_color": "292727", "is_translator": false, "id": 113532729, "profile_text_color": "000000", "followers_count": 78, "protected": false, "location": "With My Niggas In Paris!", "default_profile_image": false, "listed_count": 0, "utc_offset": -21600, "statuses_count": 6733, "description": "Made in CHINA., Educated && Making My Own $$. Fear GOD && Put Him 1st. #TeamFollowBack #TeamiPhone\n", "friends_count": 74, "profile_link_color": "b03f3f", "profile_image_url": "http://a2.twimg.com/profile_images/1818996723/image_normal.jpg", "notifications": null, "show_all_inline_media": false, "geo_enabled": true, "profile_background_color": "1f9199", "id_str": "113532729", "profile_background_image_url": "http://a3.twimg.com/profile_background_images/305726905/FASHION-3.png", "name": "Bee'Jay", "lang": "en", "profile_background_tile": true, "favourites_count": 19, "screen_name": "OohMyBEEsNice", "url": "http://www.bitchimpaid.org", "created_at": "Fri Feb 12 03:32:54 +0000 2010", "contributors_enabled": false, "time_zone": "Central Time (US & Canada)", "profile_sidebar_border_color": "000000", "default_profile": false, "following": null}, "in_reply_to_screen_name": null, "retweet_count": 0, "geo": null, "id": 168931016843603968, "source": "<a href=\"http://twitter.com/#!/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"} {"favorited": false, "in_reply_to_user_id": 50940453, "contributors": null, "truncated": false, "text": "@LegaMrvica @MimozaBand makasi om artis :D kadoo kadoo", "created_at": "Mon Feb 13 05:34:27 +0000 2012", "retweeted": false, "in_reply_to_status_id_str": "168653037894770688", "coordinates": null, "in_reply_to_user_id_str": "50940453", "entities": {"user_mentions": [{"indices": [0, 11], "screen_name": "LegaMrvica", "id": 50940453, "name": "Lega_thePianis", "id_str": "50940453"}, {"indices": [12, 23], "screen_name": "MimozaBand", "id": 375128905, "name": "Mimoza", "id_str": "375128905"}], "hashtags": [], "urls": []}, "in_reply_to_status_id": 168653037894770688, "id_str": "168931016868761600", "place": null, "user": {"follow_request_sent": null, "profile_use_background_image": true, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/347686061/Galungan_dan_Kuningan.jpg", "verified": false, "profile_image_url_https": "https://si0.twimg.com/profile_images/1803845596/Picture_20124_normal.jpg", "profile_sidebar_fill_color": "DDFFCC", "is_translator": false, "id": 48293450, "profile_text_color": "333333", "followers_count": 182, "protected": false, "location": "\u00dcT: -6.906799,107.622383", "default_profile_image": false, "listed_count": 0, "utc_offset": -28800, "statuses_count": 3052, "description": "Fashion design maranatha '11 // traditional dancer (bali) at sanggar tampak siring & Natya Nataraja", "friends_count": 206, "profile_link_color": "0084B4", "profile_image_url": "http://a3.twimg.com/profile_images/1803845596/Picture_20124_normal.jpg", "notifications": null, "show_all_inline_media": false, "geo_enabled": true, "profile_background_color": "9AE4E8", "id_str": "48293450", "profile_background_image_url": "http://a0.twimg.com/profile_background_images/347686061/Galungan_dan_Kuningan.jpg", "name": "nana afiff", "lang": "en", "profile_background_tile": true, "favourites_count": 2, "screen_name": "hasnfebria", "url": null, "created_at": "Thu Jun 18 08:50:29 +0000 2009", "contributors_enabled": false, "time_zone": "Pacific Time (US & Canada)", "profile_sidebar_border_color": "BDDCAD", "default_profile": false, "following": null}, "in_reply_to_screen_name": "LegaMrvica", "retweet_count": 0, "geo": null, "id": 168931016868761600, "source": "<a href=\"http://blackberry.com/twitter\" rel=\"nofollow\">Twitter for BlackBerry\u00ae</a>"} {"favorited": false, "in_reply_to_user_id": 27260086, "contributors": null, "truncated": false, "text": "@justinbieber u were born to be somebody, and u're super important in beliebers' life. thanks for all biebs. I love u. follow me? 84", "created_at": "Mon Feb 13 05:34:27 +0000 2012", "retweeted": false, "in_reply_to_status_id_str": null, "coordinates": null, "in_reply_to_user_id_str": "27260086", "entities": {"user_mentions": [{"indices": [0, 13], "screen_name": "justinbieber", "id": 27260086, "name": "Justin Bieber", "id_str": "27260086"}], "hashtags": [], "urls": []}, "in_reply_to_status_id": null, "id_str": "168931016856178688", "place": null, "user": {"follow_request_sent": null, "profile_use_background_image": true, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/416005864/Captura.JPG", "verified": false, "profile_image_url_https": "https://si0.twimg.com/profile_images/1808883280/Captura6_normal.JPG", "profile_sidebar_fill_color": "f5e7f3", "is_translator": false, "id": 406750700, "profile_text_color": "333333", "followers_count": 1122, "protected": false, "location": "Adentro de una supra.", "default_profile_image": false, "listed_count": 0, "utc_offset": -14400, "statuses_count": 20966, "description": "Mi \u00eddolo es @justinbieber , si te gusta \u00a1genial!, si no, solo respetalo. El cambi\u00f3 mi vida completamente y mi sue\u00f1o es conocerlo #TrueBelieber . ", "friends_count": 1015, "profile_link_color": "9404b8", "profile_image_url": "http://a1.twimg.com/profile_images/1808883280/Captura6_normal.JPG", "notifications": null, "show_all_inline_media": false, "geo_enabled": false, "profile_background_color": "f9fcfa", "id_str": "406750700", "profile_background_image_url": "http://a3.twimg.com/profile_background_images/416005864/Captura.JPG", "name": "neversaynever,right?", "lang": "es", "profile_background_tile": false, "favourites_count": 22, "screen_name": "True_Belieebers", "url": "http://www.wehavebieber-fever.tumblr.com", "created_at": "Mon Nov 07 04:17:40 +0000 2011", "contributors_enabled": false, "time_zone": "Santiago", "profile_sidebar_border_color": "C0DEED", "default_profile": false, "following": null}, "in_reply_to_screen_name": "justinbieber", "retweet_count": 0, "geo": null, "id": 168931016856178688, "source": "<a href=\"http://yfrog.com\" rel=\"nofollow\">Yfrog</a>"} I load it into DDFS with: # ddfs chunk data:test1 ./file.txt created: disco://localhost/ddfs/vol0/blob/44/file_txt-0$549-db27b-125e1 I test that the file is indeed loaded into ddfs with: # ddfs xcat data:test1 {"favorited": false, "in_reply_to_user_id": null, "contributors": null, "truncated": false, "text": "I'll call him back tomorrow I guess", "created_at": "Mon Feb 13 05:34:27 +0000 2012", "retweeted": false, "in_reply_to_status_id_str": null, "coordinates": null, "in_reply_to_user_id_str": null, "entities": {"user_mentions": [], "hashtags": [], "urls": []}, "in_reply_to_status_id": null, "id_str": "168931016843603968", "place": null, "user": {"follow_request_sent": null, "profile_use_background_image": true, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/305726905/FASHION-3.png", "verified": false, "profile_image_url_https": "https://si0.twimg.com/profile_images/1818996723/image_normal.jpg", "profile_sidebar_fill_color": "292727", "is_translator": false, "id": 113532729, "profile_text_color": "000000", "followers_count": 78, "protected": false, "location": "With My Niggas In Paris!", "default_profile_image": false, "listed_count": 0, "utc_offset": -21600, "statuses_count": 6733, "description": "Made in CHINA., Educated && Making My Own $$. Fear GOD && Put Him 1st. #TeamFollowBack #TeamiPhone\n", "friends_count": 74, "profile_link_color": "b03f3f", "profile_image_url": "http://a2.twimg.com/profile_images/1818996723/image_normal.jpg", "notifications": null, "show_all_inline_media": false, "geo_enabled": true, "profile_background_color": "1f9199", "id_str": "113532729", "profile_background_image_url": "http://a3.twimg.com/profile_background_images/305726905/FASHION-3.png", "name": "Bee'Jay", "lang": "en", "profile_background_tile": true, "favourites_count": 19, "screen_name": "OohMyBEEsNice", "url": "http://www.bitchimpaid.org", "created_at": "Fri Feb 12 03:32:54 +0000 2010", "contributors_enabled": false, "time_zone": "Central Time (US & Canada)", "profile_sidebar_border_color": "000000", "default_profile": false, "following": null}, "in_reply_to_screen_name": null, "retweet_count": 0, "geo": null, "id": 168931016843603968, "source": "<a href=\"http://twitter.com/#!/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"} {"favorited": false, "in_reply_to_user_id": 50940453, "contributors": null, "truncated": false, "text": "@LegaMrvica @MimozaBand makasi om artis :D kadoo kadoo", "created_at": "Mon Feb 13 05:34:27 +0000 2012", "retweeted": false, "in_reply_to_status_id_str": "168653037894770688", "coordinates": null, "in_reply_to_user_id_str": "50940453", "entities": {"user_mentions": [{"indices": [0, 11], "screen_name": "LegaMrvica", "id": 50940453, "name": "Lega_thePianis", "id_str": "50940453"}, {"indices": [12, 23], "screen_name": "MimozaBand", "id": 375128905, "name": "Mimoza", "id_str": "375128905"}], "hashtags": [], "urls": []}, "in_reply_to_status_id": 168653037894770688, "id_str": "168931016868761600", "place": null, "user": {"follow_request_sent": null, "profile_use_background_image": true, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/347686061/Galungan_dan_Kuningan.jpg", "verified": false, "profile_image_url_https": "https://si0.twimg.com/profile_images/1803845596/Picture_20124_normal.jpg", "profile_sidebar_fill_color": "DDFFCC", "is_translator": false, "id": 48293450, "profile_text_color": "333333", "followers_count": 182, "protected": false, "location": "\u00dcT: -6.906799,107.622383", "default_profile_image": false, "listed_count": 0, "utc_offset": -28800, "statuses_count": 3052, "description": "Fashion design maranatha '11 // traditional dancer (bali) at sanggar tampak siring & Natya Nataraja", "friends_count": 206, "profile_link_color": "0084B4", "profile_image_url": "http://a3.twimg.com/profile_images/1803845596/Picture_20124_normal.jpg", "notifications": null, "show_all_inline_media": false, "geo_enabled": true, "profile_background_color": "9AE4E8", "id_str": "48293450", "profile_background_image_url": "http://a0.twimg.com/profile_background_images/347686061/Galungan_dan_Kuningan.jpg", "name": "nana afiff", "lang": "en", "profile_background_tile": true, "favourites_count": 2, "screen_name": "hasnfebria", "url": null, "created_at": "Thu Jun 18 08:50:29 +0000 2009", "contributors_enabled": false, "time_zone": "Pacific Time (US & Canada)", "profile_sidebar_border_color": "BDDCAD", "default_profile": false, "following": null}, "in_reply_to_screen_name": "LegaMrvica", "retweet_count": 0, "geo": null, "id": 168931016868761600, "source": "<a href=\"http://blackberry.com/twitter\" rel=\"nofollow\">Twitter for BlackBerry\u00ae</a>"} {"favorited": false, "in_reply_to_user_id": 27260086, "contributors": null, "truncated": false, "text": "@justinbieber u were born to be somebody, and u're super important in beliebers' life. thanks for all biebs. I love u. follow me? 84", "created_at": "Mon Feb 13 05:34:27 +0000 2012", "retweeted": false, "in_reply_to_status_id_str": null, "coordinates": null, "in_reply_to_user_id_str": "27260086", "entities": {"user_mentions": [{"indices": [0, 13], "screen_name": "justinbieber", "id": 27260086, "name": "Justin Bieber", "id_str": "27260086"}], "hashtags": [], "urls": []}, "in_reply_to_status_id": null, "id_str": "168931016856178688", "place": null, "user": {"follow_request_sent": null, "profile_use_background_image": true, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/416005864/Captura.JPG", "verified": false, "profile_image_url_https": "https://si0.twimg.com/profile_images/1808883280/Captura6_normal.JPG", "profile_sidebar_fill_color": "f5e7f3", "is_translator": false, "id": 406750700, "profile_text_color": "333333", "followers_count": 1122, "protected": false, "location": "Adentro de una supra.", "default_profile_image": false, "listed_count": 0, "utc_offset": -14400, "statuses_count": 20966, "description": "Mi \u00eddolo es @justinbieber , si te gusta \u00a1genial!, si no, solo respetalo. El cambi\u00f3 mi vida completamente y mi sue\u00f1o es conocerlo #TrueBelieber . ", "friends_count": 1015, "profile_link_color": "9404b8", "profile_image_url": "http://a1.twimg.com/profile_images/1808883280/Captura6_normal.JPG", "notifications": null, "show_all_inline_media": false, "geo_enabled": false, "profile_background_color": "f9fcfa", "id_str": "406750700", "profile_background_image_url": "http://a3.twimg.com/profile_background_images/416005864/Captura.JPG", "name": "neversaynever,right?", "lang": "es", "profile_background_tile": false, "favourites_count": 22, "screen_name": "True_Belieebers", "url": "http://www.wehavebieber-fever.tumblr.com", "created_at": "Mon Nov 07 04:17:40 +0000 2011", "contributors_enabled": false, "time_zone": "Santiago", "profile_sidebar_border_color": "C0DEED", "default_profile": false, "following": null}, "in_reply_to_screen_name": "justinbieber", "retweet_count": 0, "geo": null, "id": 168931016856178688, "source": "<a href=\"http://yfrog.com\" rel=\"nofollow\">Yfrog</a> At this point everything is great, I load up the script that resulted from a previous Stack Post: from disco.core import Job, result_iterator import gzip def map(line, params): import unicodedata import json r = json.loads(line).get('text') s = unicodedata.normalize('NFD', r).encode('ascii', 'ignore') for word in s.split(): yield word, 1 def reduce(iter, params): from disco.util import kvgroup for word, counts in kvgroup(sorted(iter)): yield word, sum(counts) if __name__ == '__main__': job = Job().run(input=["tag://data:test1"], map=map, reduce=reduce) for word, count in result_iterator(job.wait(show=True)): print word, count NOTE: That this script runs file if the input=["file.txt"], however when I run it with "tag://data:test1" I get the following error: # DISCO_EVENTS=1 python count_normal_words.py Job@549:db30e:25bd8: Status: [map] 0 waiting, 1 running, 0 done, 0 failed 2012/11/25 21:43:26 master New job initialized! 2012/11/25 21:43:26 master Starting job 2012/11/25 21:43:26 master Starting map phase 2012/11/25 21:43:26 master map:0 assigned to solice 2012/11/25 21:43:26 master ERROR: Job failed: Worker at 'solice' died: Traceback (most recent call last): File "/home/DISCO/data/solice/01/Job@549:db30e:25bd8/usr/local/lib/python2.7/site-packages/disco/worker/__init__.py", line 329, in main job.worker.start(task, job, **jobargs) File "/home/DISCO/data/solice/01/Job@549:db30e:25bd8/usr/local/lib/python2.7/site-packages/disco/worker/__init__.py", line 290, in start self.run(task, job, **jobargs) File "/home/DISCO/data/solice/01/Job@549:db30e:25bd8/usr/local/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 286, in run getattr(self, task.mode)(task, params) File "/home/DISCO/data/solice/01/Job@549:db30e:25bd8/usr/local/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 299, in map for key, val in self['map'](entry, params): File "count_normal_words.py", line 12, in map File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded 2012/11/25 21:43:26 master WARN: Job killed Status: [map] 1 waiting, 0 running, 0 done, 1 failed Traceback (most recent call last): File "count_normal_words.py", line 28, in <module> for word, count in result_iterator(job.wait(show=True)): File "/usr/local/lib/python2.7/site-packages/disco/core.py", line 348, in wait timeout, poll_interval * 1000) File "/usr/local/lib/python2.7/site-packages/disco/core.py", line 309, in check_results raise JobError(Job(name=jobname, master=self), "Status %s" % status) disco.error.JobError: Job Job@549:db30e:25bd8 failed: Status dead The Error states: ValueError: No JSON object could be decoded. Again, this works fine using the text file as input but now DDFS. Any ideas, I'm open to suggestions?

    Read the article

  • How does Hadoop perform input splits?

    - by Deepak Konidena
    Hi, This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of the form <k,v> where k is the offset of the line from the beginning and value is the content of the line. Now, when we say that we want to run N map tasks, does the framework split the input file into N splits and run each map task on that split? or do we have to write a partitioning function that does the N splits and run each map task on the split generated? All i want to know is, whether the splits are done internally or do we have to split the data manually? More specifically, each time the map() function is called what are its Key key and Value val parameters? Thanks, Deepak

    Read the article

  • How can I get a view of favorite user documents by user in Couchdb map/reduce?

    - by Jeremy Raymond
    My Couchdb database as a main document type that looks something like: { "_id" : "doc1", "type" : "main_doc", "title" : "the first doc" ... } There is another type of document that stores user information. I want users to be able to tag documents as favorites. Different users can save the same or different documents as favorites. My idea was to introduce a favorite document to track this something like: { "_id" : "fav1", "type" : "favorite", "user_id" : "user1", "doc_id" : "doc1" } It's easy enough to create a view with user_id as the key to get a list of their favorite doc IDs. E.g: function(doc) { if (doc.type == "favorite") { emit(doc.user_id, doc.doc_id); } } However I want to list of favorites to display the user_id, doc_id and title from the document. So output something like: { "key" : "user1", "value" : ["doc1", "the first doc"] }

    Read the article

  • Hadoop on windows server

    - by Luca Martinetti
    Hello, I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM) The questions are: Is there any good tutorial on how to configure an hadoop cluster on windows? What are the requirements? java + cygwin + sshd ? Anything else? HDFS, does it play nice on windows? I'd like to use hadoop in streaming mode. Any advice, tool or trick to develop my own mapper / reducers in c#? What do you use for submitting and monitoring the jobs? Thanks

    Read the article

  • Hadoop Map/Reduce - simple use example to do the following...

    - by alexeypro
    I have MySQL database, where I store the following BLOB (which contains JSON object) and ID (for this JSON object). JSON object contains a lot of different information. Say, "city:Los Angeles" and "state:California". There are about 500k of such records for now, but they are growing. And each JSON object is quite big. My goal is to do searches (real-time) in MySQL database. Say, I want to search for all JSON objects which have "state" to "California" and "city" to "San Francisco". I want to utilize Hadoop for the task. My idea is that there will be "job", which takes chunks of, say, 100 records (rows) from MySQL, verifies them according to the given search criteria, returns those (ID's) which qualify. Pros/cons? I understand that one might think that I should utilize simple SQL power for that, but the thing is that JSON object structure is pretty "heavy", if I put it as SQL schemas, there will be at least 3-5 tables joins, which (I tried, really) creates quite a headache, and building all the right indexes eats RAM faster than I one can think. ;-) And even then, every SQL query has to be analyzed to be utilizing the indexes, otherwise with full scan it literally is a pain. And with such structure we have the only way "up" is just with vertical scaling. But I am not sure it's the best option for me, as I see how JSON objects will grow (the data structure), and I see that the number of them will grow too. :-) Help? Can somebody point me to simple examples of how this can be done? Does it make sense at all? Am I missing something important? Thank you.

    Read the article

  • Map Reduce Frameworks/Infrastructure

    - by Johannes Rudolph
    Map Reduce is a pattern that seems to get a lot of traction lately and I start to see it manifest in one of my projects that is focused on an event processing pipeline (iPhone Accelerometer and GPS data). I needed to built a lot of infrastructure for this project, in fact it overweighs the logic code interacting with it by 2x. Some of the components I built where EventProcessors (with in- and output plus buffering, timing etc.), multiplexers and aggregators. This leads me to my question what the "common" required infrastrucutre for map reduce is. Since I am working with .Net a lot I can see map reduce infrastructure built into the Framework and language constructs. Functional languages support this paradigm per se. It seems every language can be used with map reduce, some have better support than others, others again are built around that concept (e.g. Go). And there are Frameworks like Apache Hadoop to support map reduce.

    Read the article

  • Splitting input into substrings in PIG (Hadoop)

    - by Niels Basjes
    Assume I have the following input in Pig: some And I would like to convert that into: s so som some I've not (yet) found a way to iterate over a chararray in pig latin. I have found the TOKENIZE function but that splits on word boundries. So can "pig latin" do this or is this something that requires a Java class to do that?

    Read the article

  • Using PIG with Hadoop, how do I regex match parts of text with an unknown number of groups?

    - by lmonson
    I'm using Amazon's elastic map reduce. I have log files that look something like this random text foo="1" more random text foo="2" more text noise foo="1" blah blah blah foo="1" blah blah foo="3" blah blah foo="4" ... How can I write a pig expression to pick out all the numbers in the 'foo' expressions? I prefer tuples that look something like this: (1,2) (1) (1,3,4) I've tried the following: TUPLES = foreach LINES generate FLATTEN(EXTRACT(line,'foo="([0-9]+)"')); But this yields only the first match in each line: (1) (1) (1)

    Read the article

  • C#: Farm out jobs to worker processes on a multi-processor machine

    - by Andrew White
    Hi there, I have a generic check that needs to be run on ca. 1000 objects. The check takes about 3 seconds. We have a server with 4 processors (and we also have other multi-processor servers in our network) so we would like to create an exe / dll to do the checking and return the results to the "master". Does anyone know of a framework for this, or how would one go about it in C#? Specifically: * What's the best way to transfer data between the master and the worker process? * How would the master ensure that always 4 processes are running at any one time and as soon as a worker process is finished start a new one. * How to register that the worker is finished and append it's results to a list? Hope it's clear enough but happy to clarify. A.

    Read the article

  • Hadoop reduce task gets hung

    - by user806098
    I set up a hadoop cluster with 4 nodes, When running a map-reduce task, the map task finishes quickly, while the reduce task hangs at 27% percent. I checked the log, it's that the reduce task fails to fetch map output from map nodes. The job tracker log of master shows messages like this: 2011-06-27 19:55:14,748 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201106271953_0001_r_000000_0' to tip task_201106271953_0001_r_000000, for tracker 'tracker_web30.bbn.com.cn:localhost/127.0.0.1:56476' And the name node log of master shows messages like this: 2011-06-27 14:00:52,898 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310, call register(DatanodeRegistration(202.106.199.39:50010, storageID=DS-1989397900-202.106.199.39-50010-1308723051262, infoPort=50075, ipcPort=50020)) from 192.168.225.19:16129: error: java.io.IOException: verifyNodeRegistration: unknown datanode 202.106.199.3 9:50010 However, neither the "web30.bbn.com.cn" or 202.106.199.39, 202.106.199.3 is the slave node. I think such ip/domains appear because hadoop fails to resolve a node(first in the Intranet DNS server), then it goes to a higher-level DNS server, later to the top, still fails, then the "junk" ip/domains are returned. But I checked my config, it goes like this: /etc/hosts: 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.225.16 master 192.168.225.66 slave1 192.168.225.20 slave5 192.168.225.17 slave17 conf/core-site.xml: hadoop.tmp.dir /root/hadoop_tmp/hadoop_${user.name} fs.default.name hdfs://master:54310 io.sort.mb 1024 hdfs-site.xml: dfs.replication 3 masters: master slaves: master slave1 slave5 slave17 Also, all firewalls(iptables) are turned off, and ssh between each 2 nodes is ok. so I don't know where exact the error comes from. Please help. Thanks a lot.

    Read the article

  • Using Hadoop, are my reducers guaranteed to get all the records with the same key?

    - by samg
    I'm running a hadoop job (using hive actually) which is supposed to uniq lines in a lot of text file. More specifically it chooses the most recently timestamped record for each key in the reduce step. Does hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if there are many reducers running across a cluster? I'm worried that the mapper output might be split after the shuffle happens, in the middle of a set of records with the same key.

    Read the article

  • Is there an implementation of rapid concurrent syntactical sugar in scala? eg. map-reduce

    - by TiansHUo
    Passing messages around with actors is great. But I would like to have even easier code. Examples (Pseudo-code) val splicedList:List[List[Int]]=biglist.partition(100) val sum:Int=ActorPool.numberOfActors(5).getAllResults(splicedList,foldLeft(_+_)) where spliceIntoParts turns one big list into 100 small lists the numberofactors part, creates a pool which uses 5 actors and receives new jobs after a job is finished and getallresults uses a method on a list. all this done with messages passing in the background. where maybe getFirstResult, calculates the first result, and stops all other threads (like cracking a password)

    Read the article

  • CouchDB Map/Reduce raises execption in reduce function?

    - by fuzzy lollipop
    my view generates keys in this format ["job_id:1234567890", 1271430291000] where the first key element is a unique key and the second is a timestamp in milliseconds. I run my view with this elapsed_time?startkey=["123"]&endkey=["123",{}]&group=true&group_level=1 and here is my reduce function, the intention is to reduce the output to get the earliest and latest timestamps and return the difference between them and now function(keys,values,rereduce) { var now = new Date().valueOf(); var first = Number.MIN_VALUE; var last = Number.MAX_VALUE; if (rereduce) { first = Math.max(first,values[0].first); last = Math.min(last,values[0].last); } else { first = keys[0][0][1]; last = keys[keys.length-1][0][1]; } return {first:now - first, last:now - last}; } and when processing a query it constantly raises the following execption: function raised exception (new TypeError("keys has no properties", "", 1)) I am making sure not to reference keys inside my rereduce block. Why does this function constantly raise this exception?

    Read the article

  • Reducer getting fewer records than expected

    - by sathishs
    We have a scenario of generating unique key for every single row in a file. we have a timestamp column but the are multiple rows available for a same timestamp in few scenarios. We decided unique values to be timestamp appended with their respective count as mentioned in the below program. Mapper will just emit the timestamp as key and the entire row as its value, and in reducer the key is generated. Problem is Map outputs about 236 rows, of which only 230 records are fed as an input for reducer which outputs the same 230 records. public class UniqueKeyGenerator extends Configured implements Tool { private static final String SEPERATOR = "\t"; private static final int TIME_INDEX = 10; private static final String COUNT_FORMAT_DIGITS = "%010d"; public static class Map extends Mapper<LongWritable, Text, Text, Text> { @Override protected void map(LongWritable key, Text row, Context context) throws IOException, InterruptedException { String input = row.toString(); String[] vals = input.split(SEPERATOR); if (vals != null && vals.length >= TIME_INDEX) { context.write(new Text(vals[TIME_INDEX - 1]), row); } } } public static class Reduce extends Reducer<Text, Text, NullWritable, Text> { @Override protected void reduce(Text eventTimeKey, Iterable<Text> timeGroupedRows, Context context) throws IOException, InterruptedException { int cnt = 1; final String eventTime = eventTimeKey.toString(); for (Text val : timeGroupedRows) { final String res = SEPERATOR.concat(getDate( Long.valueOf(eventTime)).concat( String.format(COUNT_FORMAT_DIGITS, cnt))); val.append(res.getBytes(), 0, res.length()); cnt++; context.write(NullWritable.get(), val); } } } public static String getDate(long time) { SimpleDateFormat utcSdf = new SimpleDateFormat("yyyyMMddhhmmss"); utcSdf.setTimeZone(TimeZone.getTimeZone("America/Los_Angeles")); return utcSdf.format(new Date(time)); } public int run(String[] args) throws Exception { conf(args); return 0; } public static void main(String[] args) throws Exception { conf(args); } private static void conf(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); Job job = new Job(conf, "uniquekeygen"); job.setJarByClass(UniqueKeyGenerator.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); // job.setNumReduceTasks(400); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } } It is consistent for higher no of lines and the difference is as huge as 208969 records for an input of 20855982 lines. what might be the reason for reduced inputs to reducer?

    Read the article

  • Is there a way to configure timeout for speculative execution in Hadoop?

    - by S.O.
    I have hadoop job with tasks that are expected to run for significant length of fime (few minues). However hadoop starts speculative execution too soon. I do not want to turn speculative execution completely off but I want to increase duration of time hadoop waits before considering job for speculative execution. Is there a config option to control this timeout? Thanks

    Read the article

  • javax.security.auth.login.LoginException: Login failed

    - by abdeslam
    I'm trying to run a hadoop job (version 18.3) on my windows machine but I get the following error: Caused by: javax.security.auth.login.LoginException: Login failed: CreateProcess: bash -c groups error=2 at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:557) ... 3 more The same job works fine in an another windows machine. Do I have may be something wrong in the settings variabls? How can I fix this problem?

    Read the article

  • In MongoDB, how can I replicate this simple query using map/reduce in ruby?

    - by Matthew Rathbone
    Hi, So using the regular MongoDB library in Ruby I have the following query to find average filesize across a set of 5001 documents: avg = 0 total = collection.count() Rails.logger.info "#{total} asset creation stats in the system" collection.find().each {|row| avg += (row["filesize"] * (1/total.to_f)) if row["filesize"]} Its pretty simple, so I'm trying to do the same using map/reduce as a learning exercise. This is what I came up with: map = 'function(){emit("filesizes", {size: this.filesize, num: 1});}' reduce = 'function(k, vals){ var result = {size: 0, num: 0}; for(var x in vals) { var new_total = result.num + vals[x].num; result.num = new_total result.size = result.size + (vals[x].size * (vals[x].num / new_total)); } return result; }' @results = collection.map_reduce(map, reduce) However the two queries come back with two different results! What am I doing wrong?

    Read the article

  • Efficient algorithm to distribute work?

    - by Zwei Steinen
    It's a bit complicated to explain but here we go. We have problems like this (code is pseudo-code, and is only for illustrating the problem. Sorry it's in java. If you don't understand, I'd be glad to explain.). class Problem { final Set<Integer> allSectionIds = { 1,2,4,6,7,8,10 }; final Data data = //Some data } And a subproblem is: class SubProblem { final Set<Integer> targetedSectionIds; final Data data; SubProblem(Set<Integer> targetedSectionsIds, Data data){ this.targetedSectionIds = targetedSectionIds; this.data = data; } } Work will look like this, then. class Work implements Runnable { final Set<Section> subSections; final Data data; final Result result; Work(Set<Section> subSections, Data data) { this.sections = SubSections; this.data = data; } @Override public void run(){ for(Section section : subSections){ result.addUp(compute(data, section)); } } } Now we have instances of 'Worker', that have their own state sections I have. class Worker implements ExecutorService { final Map<Integer,Section> sectionsIHave; { sectionsIHave = {1:section1, 5:section5, 8:section8 }; } final ExecutorService executor = //some executor. @Override public void execute(SubProblem problem){ Set<Section> sectionsNeeded = fetchSections(problem.targetedSectionIds); super.execute(new Work(sectionsNeeded, problem.data); } } phew. So, we have a lot of Problems and Workers are constantly asking for more SubProblems. My task is to break up Problems into SubProblem and give it to them. The difficulty is however, that I have to later collect all the results for the SubProblems and merge (reduce) them into a Result for the whole Problem. This is however, costly, so I want to give the workers "chunks" that are as big as possible (has as many targetedSections as possible). It doesn't have to be perfect (mathematically as efficient as possible or something). I mean, I guess that it is impossible to have a perfect solution, because you can't predict how long each computation will take, etc.. But is there a good heuristic solution for this? Or maybe some resources I can read up before I go into designing? Any advice is highly appreciated!

    Read the article

  • running Hadoop software on office computers (when they are idle)

    - by Shahbaz
    Is there a project which helps setup a Hadoop cluster on office desktops, when they are idle? I'd like to experiment with Hadoop/MR/hbase but don't have acces to 5-10 computers. The computers at work are idle after hours and are connected to each other through a very high speed connection. What's more, data on these computers stays within our network so there is no privacy issue. In order for this to work I need a fairly light weight monitor running on each machine. When the computer has been idle for X hours, it will join the cluster. If the user logs on, it has to drop out of the cluster and return all CPU/memory back. Does something like this exist?

    Read the article

  • hadoop mapper static initialisation

    - by rakeshr
    Hi, I have a code fragment in which I am using a static code block to initialize a variable. public static class JoinMap extends Mapper<IntWritable, MbrWritable, LongWritable, IntWritable> { ....... public static RTree rt = null; static { String rtreeFileName = "R.rtree"; rt = new RTree(rtreeFileName); } public void map(IntWritable key, MbrWritable mbr,Context context) throws IOException, InterruptedException { ......... List elements = rt.overlaps(mbr.getRect()); ....... } } My problem is that the variable rt in the above code fragment is not getting initialised. Can anybody suggest a fix or an alternate way to initialise the variable. I don't want to initialise it inside my map function since that slows down the entire process.

    Read the article

  • Map Reduce job on Amazon: argument for custom jar

    - by zero51
    Hi all, This is one of my first try with Map Reduce on AWS in its Management Console. Hi have uploaded on AWS S3 my runnable jar developed on Hadoop 0.18, and it works on my local machine. As described on documentation, I have passed the S3 paths for input and output as argument of the jar: all right, but the problem is the third argument that is another path (as string) to a file that I need to load while the job is in execution. That file resides on S3 bucket too, but it seems that my jar doesn't recognize the path and I got a FileNotFound Exception while it tries to load it. That is strange because this is a path exactly like the other two... Anyone have any idea? Thank you Luca

    Read the article

  • can i use hadoop cloudera without root access?

    - by in_the_cloud
    a bit of a binary question (okay, not excatly) - but was wondering if one is able to configure cloudera / hadoop to run at the nodes without root shell access to the node computers (although i can setup ssh passwordless login)? appears from their instructions that root access is needed, at yet i found a hadoop wiki which suggest root access might not be needed ? http://wiki.apache.org/nutch/NutchHadoopTutorial

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8  | Next Page >