Search Results

Search found 2017 results on 81 pages for 'hadoop streaming'.

Page 4/81 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Big Data – Interacting with Hadoop – What is Sqoop? – What is Zookeeper? – Day 17 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story. There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper. What is Sqoop? Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers. What is Zookeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. In other words Zookeeper is a replicated synchronization service with eventual consistency. In simpler words – in Hadoop cluster there are many different nodes and one node is master. Let us assume that master node fails due to any reason. In this case, the role of the master node has to be transferred to a different node. The main role of the master node is managing the writers as that task requires persistence in order of writing. In this kind of scenario Zookeeper will assign new master node and make sure that Hadoop cluster performs without any glitch. Zookeeper is the Hadoop’s method of coordinating all the elements of these distributed systems. Here are few of the tasks which Zookeepr is responsible for. Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster. In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently. In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected. There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Streaming desktop with avconv - severe sound issues

    - by Tommy Brunn
    I'm trying to do some live streaming in Ubuntu 12.10, but I'm having some problems with audio. More specifically, the quality is complete garbage and it's at least 10 seconds out of sync with the video. I'm using an excellent guide found here to set up my loopback devices so that I can combine the desktop audio with the microphone input. It seems to work, as I'm able to stream both audio and video to Twitch.tv. But, as I said, the audio quality is terrible. The microphone audio is very, very low, but if I increase it, I get a horrible garbled sound that is absolutely unbearable. Nothing like that is present during VoIP calls or when recording sound alone with the sound recorder, so it's not an issue with the microphone itself. The entire audio stream is also delayed about 10-15 seconds compared to the video stream. I put together an imgur album of my settings. Here is some example output from when I'm streaming: avconv version 0.8.4-6:0.8.4-0ubuntu0.12.10.1, Copyright (c) 2000-2012 the Libav developers built on Nov 6 2012 16:51:11 with gcc 4.7.2 [x11grab @ 0x162fd80] device: :0.0+570,262 -> display: :0.0 x: 570 y: 262 width: 1280 height: 720 [x11grab @ 0x162fd80] shared memory extension found [x11grab @ 0x162fd80] Estimating duration from bitrate, this may be inaccurate Input #0, x11grab, from ':0.0+570,262': Duration: N/A, start: 1353181686.735113, bitrate: 884736 kb/s Stream #0.0: Video: rawvideo, bgra, 1280x720, 884736 kb/s, 30 tbr, 1000k tbn, 30 tbc [alsa @ 0x163fce0] capture with some ALSA plugins, especially dsnoop, may hang. [alsa @ 0x163fce0] Estimating duration from bitrate, this may be inaccurate Input #1, alsa, from 'pulse': Duration: N/A, start: 1353181686.773841, bitrate: N/A Stream #1.0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s Incompatible pixel format 'bgra' for codec 'libx264', auto-selecting format 'yuv420p' [buffer @ 0x1641ec0] w:1280 h:720 pixfmt:bgra [scale @ 0x1642480] w:1280 h:720 fmt:bgra -> w:852 h:480 fmt:yuv420p flags:0x4 [libx264 @ 0x165ae80] VBV maxrate unspecified, assuming CBR [libx264 @ 0x165ae80] using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.2 [libx264 @ 0x165ae80] profile Main, level 3.1 [libx264 @ 0x165ae80] 264 - core 123 r2189 35cf912 - H.264/MPEG-4 AVC codec - Copyleft 2003-2012 - http://www.videolan.org/x264.html - options: cabac=1 ref=2 deblock=1:0:0 analyse=0x1:0x111 me=hex subme=6 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=4 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=0 b_adapt=1 b_bias=0 direct=1 weightb=0 open_gop=1 weightp=1 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=30 rc=cbr mbtree=1 bitrate=712 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 vbv_maxrate=712 vbv_bufsize=512 nal_hrd=none ip_ratio=1.25 aq=1:1.00 Output #0, flv, to 'rtmp://live.justin.tv/app/live_23011330_Pt1plSRM0z5WVNJ0QmCHvTPmpUnfC4': Metadata: encoder : Lavf53.21.0 Stream #0.0: Video: libx264, yuv420p, 852x480, q=-1--1, 712 kb/s, 1k tbn, 30 tbc Stream #0.1: Audio: libmp3lame, 44100 Hz, 2 channels, s16, 712 kb/s Stream mapping: Stream #0:0 -> #0:0 (rawvideo -> libx264) Stream #1:0 -> #0:1 (pcm_s16le -> libmp3lame) Press ctrl-c to stop encoding frame= 17 fps= 0 q=0.0 size= 0kB time=10000000000.00 bitrate= 0.0kbitframe= 32 fps= 31 q=0.0 size= 0kB time=10000000000.00 bitrate= 0.0kbitframe= 40 fps= 23 q=29.0 size= 44kB time=0.03 bitrate=13786.2kbits/s dup=frame= 47 fps= 21 q=31.0 size= 93kB time=2.73 bitrate= 277.7kbits/s dup=0frame= 62 fps= 23 q=29.0 size= 160kB time=3.23 bitrate= 406.2kbits/s dup=0frame= 77 fps= 24 q=23.0 size= 209kB time=3.71 bitrate= 462.5kbits/s dup=0frame= 92 fps= 25 q=20.0 size= 267kB time=4.91 bitrate= 445.2kbits/s dup=0frame= 107 fps= 25 q=20.0 size= 318kB time=5.41 bitrate= 482.1kbits/s dup=0frame= 123 fps= 26 q=18.0 size= 368kB time=5.96 bitrate= 505.7kbits/s dup=0frame= 139 fps= 26 q=16.0 size= 419kB time=6.48 bitrate= 529.7kbits/s dup=0frame= 155 fps= 27 q=15.0 size= 473kB time=7.00 bitrate= 553.6kbits/s dup=0frame= 170 fps= 27 q=14.0 size= 525kB time=7.52 bitrate= 571.7kbits/s dup=0 frame= 180 fps= 25 q=-1.0 Lsize= 652kB time=7.97 bitrate= 670.0kbits/s dup=0 drop=32 //Here I stop the streaming video:531kB audio:112kB global headers:0kB muxing overhead 1.345945% [libx264 @ 0x165ae80] frame I:1 Avg QP:30.43 size: 39748 [libx264 @ 0x165ae80] frame P:45 Avg QP:11.37 size: 11110 [libx264 @ 0x165ae80] frame B:134 Avg QP:15.93 size: 27 [libx264 @ 0x165ae80] consecutive B-frames: 0.6% 0.0% 1.7% 97.8% [libx264 @ 0x165ae80] mb I I16..4: 7.3% 0.0% 92.7% [libx264 @ 0x165ae80] mb P I16..4: 0.1% 0.0% 0.1% P16..4: 49.1% 1.2% 2.1% 0.0% 0.0% skip:47.4% [libx264 @ 0x165ae80] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 0.1% 0.0% 0.0% direct: 0.0% skip:99.9% L0:42.5% L1:56.9% BI: 0.6% [libx264 @ 0x165ae80] coded y,uvDC,uvAC intra: 82.3% 87.4% 71.9% inter: 7.1% 8.4% 7.0% [libx264 @ 0x165ae80] i16 v,h,dc,p: 27% 29% 16% 28% [libx264 @ 0x165ae80] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 21% 14% 8% 8% 8% 7% 5% 7% [libx264 @ 0x165ae80] i8c dc,h,v,p: 47% 22% 20% 11% [libx264 @ 0x165ae80] Weighted P-Frames: Y:0.0% UV:0.0% [libx264 @ 0x165ae80] ref P L0: 96.4% 3.6% [libx264 @ 0x165ae80] kb/s:474.19 Received signal 2: terminating. Any ideas on how I can resolve this? The video delay is perfectly acceptable, so I wouldn't think that it's a network issue that's causing the delay in the audio. Any help would be appreciated.

    Read the article

  • What is meant by "streaming data access" in HDFS?

    - by Van Gale
    According to the HDFS Architecture page HDFS was designed for "streaming data access". I'm not sure what that means exactly, but would guess it means an operation like seek is either disabled or has sub-optimal performance. Would this be correct? I'm interested in using HDFS for storing audio/video files that need to be streamed to browser clients. Most of the streams will be start to finish, but some could have a high number of seeks. Maybe there is another file system that could do this better?

    Read the article

  • Our Flash Streaming Player Occasionally Stutters like a Skipping CD after a Period of Time

    - by Jonathan Fritz
    We offer a streaming player for a number of our clients, who are responsible for their providing us with their own audio streams. We have written a very simple flash player that can play all of the streams that we support (icecast/shoutcast/live365/mp3 over http/etc). Unfortunately, we have found that when listening, our player sometimes begins to stutter (like a skipping cd), sometimes after only 10 minutes, and sometimes after an hour of listening. We have noticed this behaviour in firefox on both linux and windows. Does anybody know anything about this problem? We know that flash isn't ideal for infinite streams of audio, but it's about all that we can find that's on every platform out there. If anybody can suggest a solution to our problem, I'll be your friend forever. Here is a link to the live player: http://cr-jf.jfritz.02.dev.wecreate.com/streaming/player_v5/ Note that you'll need to test in a browser that isn't IE, because we use WMP in IE, and that the JavaScript on the page will cause the player to unload and re-load once an hour because of memory issues. Because I can only put one hyperlink in a post, I'll add a link to the player source code as a comment. Thanks all!

    Read the article

  • hadoop beginners question

    - by Omnipresent
    I've read some documentation about hadoop and seen the impressive results. I get the bigger picture but am finding it hard whether it would fit our setup. Question isnt programming related but I'm eager to get opinion of people who currently work with hadoop and how it would fit our setup: We use Oracle for backend Java (Struts2/Servlets/iBatis) for frontend Nightly we get data which needs to be summarized. this runs as a batch process (takes 5 hours) We are looking for a way to cut those 5 hours to a shorter time. Where would hadoop fit into this picture? Can we still continue to use Oracle even after hadoop?

    Read the article

  • Project Idea with Hadoop MapReduce

    - by Aditya Andhalikar
    Hello, I learnt Hadoop a few months back and managed to do a very introductory programming project on it. I want to do a small - medium sized project or series of small programming assignments with Hadoop. I have seen lot of ideas around but I dont see anything that can be finished in about 60-70 hours of work so a pretty small scale project as I want to do that in my spare time along with other studies. Most project ideas I have seen sort of large to go on for 2-3 months. My main objective out of this exercise to develop good expertise in programming with Hadoop environment not to do any research or solve specific problems. I see Hadoop being used lot of with webservices maybe that would be an interesting track for small projects. Thank you in advance. Regards, Aditya

    Read the article

  • PHP Video Editing and Streaming

    - by Gaurav Padia
    Hi, I am developing online video streaming website on PHP. I need two functionalities: Need to add title/text at bottom of the video dynamically. Need to add background music to video dynamically. Is it possible with PHP or any available open source library? Can anyone guide me or provide links to this type of library ? Thanks.

    Read the article

  • Hadoop reduce task gets hung

    - by user806098
    I set up a hadoop cluster with 4 nodes, When running a map-reduce task, the map task finishes quickly, while the reduce task hangs at 27% percent. I checked the log, it's that the reduce task fails to fetch map output from map nodes. The job tracker log of master shows messages like this: 2011-06-27 19:55:14,748 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201106271953_0001_r_000000_0' to tip task_201106271953_0001_r_000000, for tracker 'tracker_web30.bbn.com.cn:localhost/127.0.0.1:56476' And the name node log of master shows messages like this: 2011-06-27 14:00:52,898 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310, call register(DatanodeRegistration(202.106.199.39:50010, storageID=DS-1989397900-202.106.199.39-50010-1308723051262, infoPort=50075, ipcPort=50020)) from 192.168.225.19:16129: error: java.io.IOException: verifyNodeRegistration: unknown datanode 202.106.199.3 9:50010 However, neither the "web30.bbn.com.cn" or 202.106.199.39, 202.106.199.3 is the slave node. I think such ip/domains appear because hadoop fails to resolve a node(first in the Intranet DNS server), then it goes to a higher-level DNS server, later to the top, still fails, then the "junk" ip/domains are returned. But I checked my config, it goes like this: /etc/hosts: 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.225.16 master 192.168.225.66 slave1 192.168.225.20 slave5 192.168.225.17 slave17 conf/core-site.xml: hadoop.tmp.dir /root/hadoop_tmp/hadoop_${user.name} fs.default.name hdfs://master:54310 io.sort.mb 1024 hdfs-site.xml: dfs.replication 3 masters: master slaves: master slave1 slave5 slave17 Also, all firewalls(iptables) are turned off, and ssh between each 2 nodes is ok. so I don't know where exact the error comes from. Please help. Thanks a lot.

    Read the article

  • How to learn using Hadoop

    - by ajay
    hi, I want to learn hadoop. However, I don't have access to a cluster now. Is it possible for me to learn it and use it for writing programs and learn it properly. Would it be helpful to run multiple linux VMs and then use them as boxes to run hadoop? Or you think that is more of a stretch and running it on a multiple hosts is the same as running in on single host (in terms of setup, Hadoop API used, the architecture of the map-reduce programs etc) Thanks,

    Read the article

  • Configuring Hadoop logging to avoid too many log files

    - by Eric Wendelin
    I'm having a problem with Hadoop producing too many log files in $HADOOP_LOG_DIR/userlogs (the Ext3 filesystem allows only 32000 subdirectories) which looks like the same problem in this question: http://stackoverflow.com/questions/2091287/error-in-hadoop-mapreduce My question is: does anyone know how to configure Hadoop to roll the log dir or otherwise prevent this? I'm trying to avoid just setting the "mapred.userlog.retain.hours" and/or "mapred.userlog.limit.kb" properties because I want to actually keep the log files. I was also hoping to configure this in log4j.properties, but looking at the Hadoop 0.20.2 source, it writes directly to logfiles instead of actually using log4j. Perhaps I don't understand how it's using log4j fully. Any suggestions or clarifications would be greatly appreciated.

    Read the article

  • Typical Hadoop setup for remote job submission

    - by Artii
    So I am still a bit new to hadoop and am currently in the process of setting up a small test cluster on Amazonaws. So my question relates to some tips on the structuring of the cluster so it is possible to work submit jobs from remote machines. Currently I have 5 machines. 4 are basically the Hadoop cluster with the NameNodes, Yarn etc. One machine is used as a manager machine( Cloudera Manager). I am gonna describe my thinking process on the setup and if anyone can chime in the points I am not clear with, that would be great. I was thinking what was the best setup for a small cluster. So I decided to expose only one manager machine and probably use that to submit all the jobs through it. The other machines will see each other etc, but not be accessible from the outside world. I am have conceptual idea on how to do this,but I am not sure how to properly go about doing this though, if anyone could point me in the right direction that would great. Also another big point is, I want to be able to submit jobs to the cluster through exposed machine from a client machine (might be Windows). I am not so clear on this setup as well. Do I need to have Hadoop installed on the machine in order to use the normal hadoop commands, and to write/submit jobs say from Eclipse or something similar. So to sum it up my questions are, Is this an ok setup for a small test cluster How can I go about using one exposed machine to submit/route jobs to the cluster, without having any of the Hadoop nodes on it. How do I setup a client machine to submit jobs to a remote cluster, and an example on how to do it on Windows. Also if there are any reason not to use Windows as a client machine in this setup. Thanks I would greatly appreciate any advice or help on this.

    Read the article

  • Any tested Frameworks/Solutions similar to Apache Hadoop?

    - by andreas
    Hello, I am interested in the Apache Hadoop project, but i would like to know if any other tested (please mind the 'tested') projects/frameworks are out there. Appreciate any information/links to projects similar to Apache Hadoop and any comments on the Apache Hadoop project from anyone that has used it. Regards,

    Read the article

  • Hadoop Map Reduce job never finishes

    - by rohanbk
    I am running a Hadoop Map Reduce job using a Python Mapper and Reducer script, and Hadoop Streaming. Both my Map and Reduce jobs run till they are both 100%, but the job doesn't end. I know that when things go sour, Hadoop will terminate the job, but in this case, both stages reach a 100% and just never end. Has anyone else encountered anything similar? Also, how do I debug my program to figure out where things are going wrong? If I use a smaller input file, and I just run something like: $> cat input_file | mapper.py | sort | reduce.py >> output_file everything works perfectly fine. However, when I use Hadoop, things don't work out.

    Read the article

  • How to Use Steam In-Home Streaming

    - by Chris Hoffman
    Steam’s In-Home Streaming is now available to everyone, allowing you to stream PC games from one PC to another PC on the same local network. Use your gaming PC to power your laptops and home theater system. This feature doesn’t allow you to stream games over the Internet, only the same local network. Even if you tricked Steam, you probably wouldn’t get good streaming performance over the Internet. Why Stream? When you use Steam In-Home streaming, one PC sends its video and audio to another PC. The other PC views the video and audio like it’s watching a movie, sending back mouse, keyboard, and controller input to the other PC. This allows you to have a fast gaming PC power your gaming experience on slower PCs. For example, you could play graphically demanding games on a laptop in another room of your house, even if that laptop has slower integrated graphics. You could connect a slower PC to your television and use your gaming PC without hauling it into a different room in your house. Streaming also enables cross-platform compatibility. You could have a Windows gaming PC and stream games to a Mac or Linux system. This will be Valve’s official solution for compatibility with old Windows-only games on the Linux (Steam OS) Steam Machines arriving later this year. NVIDIA offers their own game streaming solution, but it requires certain NVIDIA graphics hardware and can only stream to an NVIDIA Shield device. How to Get Started In-Home Streaming is simple to use and doesn’t require any complex configuration — or any configuration, really. First, log into the Steam program on a Windows PC. This should ideally be a powerful gaming PC with a powerful CPU and fast graphics hardware. Install the games you want to stream if you haven’t already — you’ll be streaming from your PC, not from Valve’s servers. (Valve will eventually allow you to stream games from Mac OS X, Linux, and Steam OS systems, but that feature isn’t yet available. You can still stream games to these other operating systems.) Next, log into Steam on another computer on the same network with the same Steam username. Both computers have to be on the same subnet of the same local network. You’ll see the games installed on your other PC in the Steam client’s library. Click the Stream button to start streaming a game from your other PC. The game will launch on your host PC, and it will send its audio and video to the PC in front of you. Your input on the client will be sent back to the server. Be sure to update Steam on both computers if you don’t see this feature. Use the Steam > Check for Updates option within Steam and install the latest update. Updating to the latest graphics drivers for your computer’s hardware is always a good idea, too. Improving Performance Here’s what Valve recommends for good streaming performance: Host PC: A quad-core CPU for the computer running the game, minimum. The computer needs enough processor power to run the game, compress the video and audio, and send it over the network with low latency. Streaming Client: A GPU that supports hardware-accelerated H.264 decoding on the client PC. This hardware is included on all recent laptops and PCs. Ifyou have an older PC or netbook, it may not be able to decode the video stream quickly enough. Network Hardware: A wired network connection is ideal. You may have success with wireless N or AC networks with good signals, but this isn’t guaranteed. Game Settings: While streaming a game, visit the game’s setting screen and lower the resolution or turn off VSync to speed things up. In-Home Steaming Settings: On the host PC, click Steam > Settings and select In-Home Streaming to view the In-Home Streaming settings. You can modify your streaming settings to improve performance and reduce latency. Feel free to experiment with the options here and see how they affect performance — they should be self-explanatory. Check Valve’s In-Home Streaming documentation for troubleshooting information. You can also try streaming non-Steam games. Click Games > Add a Non-Steam Game to My Library on your host PC and add a PC game you have installed elsewhere on your system. You can then try streaming it from your client PC. Valve says this “may work but is not officially supported.” Image Credit: Robert Couse-Baker on Flickr, Milestoned on Flickr

    Read the article

  • choppy streaming audio

    - by user88503
    I could use some help troubleshooting choppy streaming audio. The problem is jerky playback regardless of audio or video with audio. Both Chromium and Firefox have the problem, however files played directly on the machine with Rhythmbox sound just fine. I'm running 12.04 LTS on a C2D T9300. Most of the audio problems others ask about seem to be hardware related, so the following information might be relevant. sudo lshw -c multimedia *-multimedia description: Audio device product: 82801H (ICH8 Family) HD Audio Controller vendor: Intel Corporation physical id: 1b bus info: pci@0000:00:1b.0 version: 03 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=snd_hda_intel latency=0 resources: irq:48 memory:f8400000-f8403fff

    Read the article

  • MySQL and Hadoop Integration - Unlocking New Insight

    - by Mat Keep
    “Big Data” offers the potential for organizations to revolutionize their operations. With the volume of business data doubling every 1.2 years, analysts and business users are discovering very real benefits when integrating and analyzing data from multiple sources, enabling deeper insight into their customers, partners, and business processes. As the world’s most popular open source database, and the most deployed database in the web and cloud, MySQL is a key component of many big data platforms, with Hadoop vendors estimating 80% of deployments are integrated with MySQL. The new Guide to MySQL and Hadoop presents the tools enabling integration between the two data platforms, supporting the data lifecycle from acquisition and organisation to analysis and visualisation / decision, as shown in the figure below The Guide details each of these stages and the technologies supporting them: Acquire: Through new NoSQL APIs, MySQL is able to ingest high volume, high velocity data, without sacrificing ACID guarantees, thereby ensuring data quality. Real-time analytics can also be run against newly acquired data, enabling immediate business insight, before data is loaded into Hadoop. In addition, sensitive data can be pre-processed, for example healthcare or financial services records can be anonymized, before transfer to Hadoop. Organize: Data is transferred from MySQL tables to Hadoop using Apache Sqoop. With the MySQL Binlog (Binary Log) API, users can also invoke real-time change data capture processes to stream updates to HDFS. Analyze: Multi-structured data ingested from multiple sources is consolidated and processed within the Hadoop platform. Decide: The results of the analysis are loaded back to MySQL via Apache Sqoop where they inform real-time operational processes or provide source data for BI analytics tools. So how are companies taking advantage of this today? As an example, on-line retailers can use big data from their web properties to better understand site visitors’ activities, such as paths through the site, pages viewed, and comments posted. This knowledge can be combined with user profiles and purchasing history to gain a better understanding of customers, and the delivery of highly targeted offers. Of course, it is not just in the web that big data can make a difference. Every business activity can benefit, with other common use cases including: - Sentiment analysis; - Marketing campaign analysis; - Customer churn modeling; - Fraud detection; - Research and Development; - Risk Modeling; - And more. As the guide discusses, Big Data is promising a significant transformation of the way organizations leverage data to run their businesses. MySQL can be seamlessly integrated within a Big Data lifecycle, enabling the unification of multi-structured data into common data platforms, taking advantage of all new data sources and yielding more insight than was ever previously imaginable. Download the guide to MySQL and Hadoop integration to learn more. I'd also be interested in hearing about how you are integrating MySQL with Hadoop today, and your requirements for the future, so please use the comments on this blog to share your insights.

    Read the article

  • Amazon Elastic MapReduce: Exception from FileSystem

    - by S.N
    Hi, I run my application using ruby client: ruby elastic-mapreduce -j j-20PEKMT9BRSUC --jar s3n://sakae55/lib/edu.cit.som.jar --main-class edu.cit.som.hadoop.SOMDriver --arg s3n://sakae55/repository/input/ecoli/ --arg s3n://sakae55/repository/output/ecoli/pl/ --arg s3n://sakae55/repository/data/ecoli/som.txt Then, I am seeing the following error: java.lang.IllegalArgumentException: This file system object (file:///) does not support access to the request path 'hdfs://i -10-195-207-230.ec2.internal:9000/mnt/var/lib/hadoop/tmp/mapred/system/job_201004221221_0017/job.jar' You possibly called Fi eSystem.get(conf) when you should of called FileSystem.get(uri, conf) to obtain a file system supporting your path. at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:52) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:416) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:259) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:676) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:200) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1184) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1160) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1132) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:662) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:729) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026) at edu.cit.som.hadoop.SOMDriver.runIteration(SOMDriver.java:106) at edu.cit.som.hadoop.SOMDriver.train(SOMDriver.java:69) at edu.cit.som.hadoop.SOMDriver.run(SOMDriver.java:52) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at edu.cit.som.hadoop.SOMDriver.main(SOMDriver.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) I am not sure why the error references to "file:///" even though all the arguments I pass do not use the schema.

    Read the article

  • How a .NET Programmer learn Big Data/Hadoop? [on hold]

    - by Smith Pascal Jr.
    I have been ASP.NET developer for sometime now and I have been reading a lot about Big Data- Hadoop and its future as to how it is the next technology in IT and how it would be useful to create million of jobs in US and elsewhere in the world. Now since Hadoop is an open source big data tool which is managed by Apache Server Foundation Group, I'm assuming I have to be well aware of JAVA - Correct me if I'm wrong. Moreover, How a .NET programmer can learn Big Data and its related technologies and can work professionally full time into this technology? What challenges and opportunities does a .NET professional face while changing the technology platform? Please advice. Thanks

    Read the article

  • Ubuntu one, music streaming

    - by iknowshall
    I'm running Ubuntu 11.10, and just signed up for Ubuntu One and Ubuntu One Music. Generally file sync is working fine, but music has been a complete failure so far. Here's what I'm looking at: After 24 hours, not a single mp3 or ogg file from my laptop has sync'd to the Cloud Music folder. I have 4 gigs of data used, which is definitely not enough to include music files. That's about the right amount for my docs and photos. With music, it should be more like 13 gigs. No music shows up on the Ubuntu One Music app on my phone, nor on the web view of my Ubuntu One Cloud folder I followed all the online instructions I could find. Basically, on my laptop, right click on the Music folder and select "Ubuntu One Synchronize this folder" The Music folder now has a green check mark on it indicating its sync'd The Ubuntu One app on my laptop says "File Sync is up to date" I did provide credit card information, and on the Ubuntu One website "Music Streaming" is listed as one of my services. So what am I doing wrong? Just read this post as I'm experiencing the same problem, the answer for this post says that due to the large number of files trying to be uploaded, that it could take up to a week. However I am, as above also being told that "File sync is up to date" Can somebody confirm that this is a bug and that I just need to wait for File sync to actually be up to date? Cheers

    Read the article

  • Apache DVB http video Streaming bandwidth or priority problem

    - by igino manfre'
    I'm streaming few precompressed DVB videos from cloud. The streams are generated from VLC on "impossible" ports (such as 64085, 64086 etc) reverse proxed by Apache on port 80 and 8080. All the generated streams are listed in "http://95.110.164.61/indexv.html". From an ADSL connection with enough downlink bandwidth, recalling the stream generated by VLC (such as "http://95.110.164.61:64087/mpg2_6.4") it flows fluently. Recalling the same stream proxed by Apache ("http://95.110.164.61/mpg2_6.4") the stream stops and goes. The only situation in which the Apache proxed streams flow regularly is from a site connected through 64 Mbps warranted bandwith with RTT to the server less than 10 mseconds. Please note that streams below 2 Mbps are fluently proxed. The system is a single core xeon with windows 2008 R2 on 4 GB of RAM with 1 Gbps of network bandwidth. The drain of computational and bandwidth resources is negligeable, the RAM usage always lower than 50%. On the system I run many VLC streamers. Any of them drains a variable amount of RAM (from about 25 to 70 MB). On the contrary the couple of httpd.exe processes drain no more than 7 MB. Using Wireshark (on the server) I see that VLC directy send to the client much more packets than Apache, and the stream is framgmented on many frames. I'm not a programmer, a newby of Apache. Can anyone please address me to a specific portion of the Apache's huge documentation? Thank you. igino

    Read the article

  • Hive metadata permission issue

    - by Chandramohan
    We are getting this error on Hive, while creating a DB / table hive> CREATE TABLE pokes (foo INT, bar STRING); FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. NestedThrowables: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Hive log : org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:298) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:601) at org.datanucleus.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:286) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:182) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1953) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:234) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:261) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:196) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:171) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:354) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:306) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:451) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:232) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:108) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1868) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1878) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:470) ... 15 more Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114) at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:521) at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:290) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:588) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:300) at org.datanucleus.ObjectManagerFactoryImpl.initialiseStoreManager(ObjectManagerFactoryImpl.java:161) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:583) ... 42 more Caused by: java.util.NoSuchElementException: Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1191) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) ... 52 more 2011-08-11 18:02:36,964 ERROR ql.Driver (SessionState.java:printError(343)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

    Read the article

  • Audio/video streaming on Windows platform

    - by bushtucker
    I'm building an interactive language learning application to be used in a classroom environment. The idea is that a teacher should be able to talk to the students (=audio stream to all students), let students talk to each other (= audio P2P) in groups of two or more, let students watch a video coming from a the DVD player or coming from a media server. It should be possible to save the audio/video streams. The teacher should also be able to monitor, take-over or block the desktop of the students. The platform is Windows and it's a desktop application, no web application. The audio delay should be as minimal as poosible. Optionally a student sitting at home should be supported, but it's not a high priority. I am now finished with the classroom control part of the application (login, monitor, block, ...) and want to start the audio and video part. I've been evaluating several options like DirectX, GStreamer and SIP but now I have to make a decision. DirectX seems an obvious choice for the Windows platform, but it only lets me capture and playback audio and video. The encoding/decoding/network part I should do myself. GStreamer contains all kinds of options to capture/encode/stream/save audio and video streams. I've experimented a bit with it (ossbuild) and it does seem to involve a lot of trial and error to make something work: - microphone capture (via directsoundsrc) produces cracking noises on some computers - rtpL16 payloader didn't work well - streaming raw audio over the network only working at a sampling rate of 8000, no higher - there are a lot of errors when receiving mpeg4 video (bad I-frame), on some computers worse than others It is my impression that gstreamer is primary targetted at linux platforms. Development and support for the Windows platform seems to be a little behind. Nevertheless it's a powerful framework that could save me months and years of work. SIP seems to be able to do everything I want, but it is targeted towards telephony and IM. I don't know how flexible SIP is. It seems to me that the SIP layer would just be overhead as I already have a central (teacher) application that can control and setup all the streams. The interesting parts of frameworks like opalvoip and freeswitch are the actual audio/video capture, the encoding and transmission. Does anyone know how these interesting parts relate a framework like gstreamer? Are they easy to integrate into a custom application? Are they flexible enough? Does anyone have experience with all or one of these technologies? Maybe there are even other options I can look at? Many thanks for your advice

    Read the article

  • ClassNotFoundException error in implementing Bayesian algorithm in Apache Mahout on Hadoop

    - by Shweta
    Hi, I have a problem in executing the Bayesian algorithm in Mahout. I built it with Maven and the job file is in target directory. When run from terminal using hadoop, I'm getting the ClassNotFoundException error. What should be done? $HADOOP_HOME/bin/hadoop jar mahout-core-0.3-SNAPSHOT.job org.apache.mahout.classifier.bayes.mapreduce.bayes.bayesdriver -i test -o output Exception in thread "main" java.lang.ClassNotFoundException: org.apache.mahout.classifier.bayes.mapreduce.bayes.bayesdriver at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

    Read the article

  • Hadoop on windows server

    - by Luca Martinetti
    Hello, I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM) The questions are: Is there any good tutorial on how to configure an hadoop cluster on windows? What are the requirements? java + cygwin + sshd ? Anything else? HDFS, does it play nice on windows? I'd like to use hadoop in streaming mode. Any advice, tool or trick to develop my own mapper / reducers in c#? What do you use for submitting and monitoring the jobs? Thanks

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >