hadoop streaming - Page 22

Amazon EC2 - network issues

- by Algorist

Hi, We are launching hadoop cluster on amazon ec2 and recently we are having network issues like master unable to connect to slave. We thought the reason is due to amazon throttling the network connections over a limit. So, we tried to establish a connection after a random delay from each slave node. But, that didn't help. Are there any other suggestions? Thank you Bala

Read the article

Any Open Source Pregel like framework for distributed processing of large Graphs?

- by Akshay Bhat

Google has described a novel framework for distributed processing on Massive Graphs. http://portal.acm.org/citation.cfm?id=1582716.1582723 I wanted to know if similar to Hadoop (Map-Reduce) are there any open source implementations of this framework? I am actually in process of writing a Pseudo distributed one using python and multiprocessing module and thus wanted to know if someone else has also tried implementing it. Since public information about this framework is extremely scarce. (A link above and a blog post at Google Research)

Read the article

Nearmap architecture

- by portoalet

Looking at http://www.nearmap.com/, Just wondering if you can approximate how much storage is needed to store the images? (NearMap’s monthly city PhotoMaps are captured at 3cm, 5cm, 7.5cm, or 10cm resolution) And what kind of systems/architecture is suitable to deliver those data/images? (say you are not Google, and want to implement this from scratch, what would you do? ) ie. would you store the images in Hadoop, and use memcache to deliver etc ?

Read the article

How does Hive compare to HBase?

- by mrhahn

I'm interested in finding out how the recently-released (http://mirror.facebook.com/facebook/hive/hadoop-0.17/) Hive compares to HBase in terms of performance. The SQL-like interface used by Hive is very much preferable to the HBase API we have implemented.

Read the article

How can I load a file into a DataBag from within a Yahoo PigLatin UDF?

- by Cervo

I have a Pig program where I am trying to compute the minimum center between two bags. In order for it to work, I found I need to COGROUP the bags into a single dataset. The entire operation takes a long time. I want to either open one of the bags from disk within the UDF, or to be able to pass another relation into the UDF without needing to COGROUP...... Code: # **** Load files for iteration **** register myudfs.jar; wordcounts = LOAD 'input/wordcounts.txt' USING PigStorage('\t') AS (PatentNumber:chararray, word:chararray, frequency:double); centerassignments = load 'input/centerassignments/part-*' USING PigStorage('\t') AS (PatentNumber: chararray, oldCenter: chararray, newCenter: chararray); kcenters = LOAD 'input/kcenters/part-*' USING PigStorage('\t') AS (CenterID:chararray, word:chararray, frequency:double); kcentersa1 = CROSS centerassignments, kcenters; kcentersa = FOREACH kcentersa1 GENERATE centerassignments::PatentNumber as PatentNumber, kcenters::CenterID as CenterID, kcenters::word as word, kcenters::frequency as frequency; #***** Assign to nearest k-mean ******* assignpre1 = COGROUP wordcounts by PatentNumber, kcentersa by PatentNumber; assignwork2 = FOREACH assignpre1 GENERATE group as PatentNumber, myudfs.kmeans(wordcounts, kcentersa) as CenterID; basically my issue is that for each patent I need to pass the sub relations (wordcounts, kcenters). In order to do this, I do a cross and then a COGROUP by PatentNumber in order to get the set PatentNumber, {wordcounts}, {kcenters}. If I could figure a way to pass a relation or open up the centers from within the UDF, then I could just GROUP wordcounts by PatentNumber and run myudfs.kmeans(wordcount) which is hopefully much faster without the CROSS/COGROUP. This is an expensive operation. Currently this takes about 20 minutes and appears to tack the CPU/RAM. I was thinking it might be more efficient without the CROSS. I'm not sure it will be faster, so I'd like to experiment. Anyway it looks like calling the Loading functions from within Pig needs a PigContext object which I don't get from an evalfunc. And to use the hadoop file system, I need some initial objects as well, which I don't see how to get. So my question is how can I open a file from the hadoop file system from within a PIG UDF? I also run the UDF via main for debugging. So I need to load from the normal filesystem when in debug mode. Another better idea would be if there was a way to pass a relation into a UDF without needing to CROSS/COGROUP. This would be ideal, particularly if the relation resides in memory.. ie being able to do myudfs.kmeans(wordcounts, kcenters) without needing the CROSS/COGROUP with kcenters... But the basic idea is to trade IO for RAM/CPU cycles. Anyway any help will be much appreciated, the PIG UDFs aren't super well documented beyond the most simple ones, even in the UDF manual.

Read the article

Stateful Iterators Java

- by Gitmo

What is a Stateful Iterator? This question relates to an Iterator defined in Hadoop for performing Joins. As the reference documentation states: This defines an interface to a stateful Iterator that can replay elements added to it directly. Note that this does not extend Iterator. What does 'replay elements added to it directly' mean? How is this iterator different from a usual iterator?

Read the article

Look up values in a BDB for several files in parallel

- by biznez

What is the most efficient way to look up values in a BDB for several files in parallel? If I had a Perl script which did this for one file at a time, would forking/running the process in background with the ampersand in Linux work? How might Hadoop be used to solve this problem? Would threading be another solution?

Read the article

Streaming files from EventMachine handler?

- by Noah

I am creating a streaming eventmachine server. I'm concerned about avoiding blocking IO or doing anything else to muck up the event loop. From what I've read, ruby's non-blocking IO can be used to stream files in a non-blocking way, or I can call next_tick, but I'm a little unclear about which of these approaches is preferable. Part of the problem is that I have not found a good explanation of non-blocking IO library functions in ruby. Short version: Assuming a long-lived network IO operation, several wall clock minutes of streaming per file, transfer, what is the best way to do this in eventmachine without gumming up the event loop? while 1 do file.read do |bytes| @conn.send_data bytes end end I understand that the above code will block and I'm wondering what to put in its place. Also, I cannot use the FileStreamer class that is part of eventmachine as is, because I need to manipulate the data after it's read but before it's sent. Thanks, Noah

Read the article

video streaming through jsp/servlet

- by angelina

Dear all, Please let me no how can i use video streaming using jsp/servlet.

Read the article

Tell me some Http-streaming tutorial or example.

- by xRobot

Could you tell me some http-streaming tutorial or example ( used also by Gmail ) ? Thanks ;)

Read the article

monitoring streaming server and display throughput

- by I__

Scenario: laptop (running RHEL 5.3 / 5.4) with Wi-Fi allowing incoming connections (the laptop is the DHCP server and default gateway of any device that connects to it). The laptop has a streaming server installed (my app). I need to program an app that could monitor this link (device / streaming server) and display the throughput. More importantly, I need this app to be able to throttle the throughput. Think WANem but as an app, or netlimiter but (way) simpler and for RHEL. If you need clarifications, let me know. is there a library that could help me? i've done mostly windows business applications programming, and i have no clue about this stuff. please help me to get started!

Read the article

New Feature in ODI 11.1.1.6: ODI for Big Data

- by Julien Testut

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} By Ananth Tirupattur Starting with Oracle Data Integrator 11.1.1.6.0, ODI is offering a solution to process Big Data. This post provides an overview of this feature. With all the buzz around Big Data and before getting into the details of ODI for Big Data, I will provide a brief introduction to Big Data and Oracle Solution for Big Data. So, what is Big Data? Big data includes: structured data (this includes data from relation data stores, xml data stores), semi-structured data (this includes data from weblogs) unstructured data (this includes data from text blob, images) Traditionally, business decisions are based on the information gathered from transactional data. For example, transactional Data from CRM applications is fed to a decision system for analysis and decision making. Products such as ODI play a key role in enabling decision systems. However, with the emergence of massive amounts of semi-structured and unstructured data it is important for decision system to include them in the analysis to achieve better decision making capability. While there is an abundance of opportunities for business for gaining competitive advantages, process of Big Data has challenges. The challenges of processing Big Data include: Volume of data Velocity of data - The high Rate at which data is generated Variety of data In order to address these challenges and convert them into opportunities, we would need an appropriate framework, platform and the right set of tools. Hadoop is an open source framework which is highly scalable, fault tolerant system, for storage and processing large amounts of data. Hadoop provides 2 key services, distributed and reliable storage called Hadoop Distributed File System or HDFS and a framework for parallel data processing called Map-Reduce. Innovations in Hadoop and its related technology continue to rapidly evolve, hence therefore, it is highly recommended to follow information on the web to keep up with latest information. Oracle's vision is to provide a comprehensive solution to address the challenges faced by Big Data. Oracle is providing the necessary Hardware, software and tools for processing Big Data Oracle solution includes: Big Data Appliance Oracle NoSQL Database Cloudera distribution for Hadoop Oracle R Enterprise- R is a statistical package which is very popular among data scientists. ODI solution for Big Data Oracle Loader for Hadoop for loading data from Hadoop to Oracle. Further details can be found here: http://www.oracle.com/us/products/database/big-data-appliance/overview/index.html ODI Solution for Big Data: ODI’s goal is to minimize the need to understand the complexity of Hadoop framework and simplify the adoption of processing Big Data seamlessly in an enterprise. ODI is providing the capabilities for an integrated architecture for processing Big Data. This includes capability to load data in to Hadoop, process data in Hadoop and load data from Hadoop into Oracle. ODI is expanding its support for Big Data by providing the following out of the box Knowledge Modules (KMs). IKM File to Hive (LOAD DATA).Load unstructured data from File (Local file system or HDFS ) into Hive IKM Hive Control AppendTransform and validate structured data on Hive IKM Hive TransformTransform unstructured data on Hive IKM File/Hive to Oracle (OLH)Load processed data in Hive to Oracle RKM HiveReverse engineer Hive tables to generate models Using the Loading KM you can map files (local and HDFS files) to the corresponding Hive tables. For example, you can map weblog files categorized by date into a corresponding partitioned Hive table schema. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Using the Hive control Append KM you can validate and transform data in Hive. In the below example, two source Hive tables are joined and mapped to a target Hive table. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} The Hive Transform KM facilitates processing of semi-structured data in Hive. In the below example, the data from weblog is processed using a Perl script and mapped to target Hive table. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Using the Oracle Loader for Hadoop (OLH) KM you can load data from Hive table or HDFS to a corresponding table in Oracle. OLH is available as a standalone product. ODI greatly enhances OLH capability by generating the configuration and mapping files for OLH based on the configuration provided in the interface and KM options. ODI seamlessly invokes OLH when executing the scenario. In the below example, a HDFS file is mapped to a table in Oracle. Development and Deployment:The following diagram illustrates the development and deployment of ODI solution for Big Data. Using the ODI Studio on your development machine create and develop ODI solution for processing Big Data by connecting to a MySQL DB or Oracle database on a BDA machine or Hadoop cluster. Schedule the ODI scenarios to be executed on the ODI agent deployed on the BDA machine or Hadoop cluster. ODI Solution for Big Data provides several exciting new capabilities to facilitate the adoption of Big Data in an enterprise. You can find more information about the Oracle Big Data connectors on OTN. You can find an overview of all the new features introduced in ODI 11.1.1.6 in the following document: ODI 11.1.1.6 New Features Overview

Read the article

How can I use Android as a remote control for streaming?

- by michael

I would like to know if I can use Android to remote control streaming on my laptop? I would like to use my laptop as my streaming server and use my HDTV to view the stream. And I need some way to remote control my streaming server. I have read about http://maketecheasier.com/install-vlc-shares-in-Ubuntu-and-stream-videos-to-Android/2011/02/25 and http://code.google.com/p/android-vlc-remote/ but those are streaming to Android phone itself. I am just need something to remote control streaming to my TV. Is that possible?

Read the article

Could not start ZK at requested port of 2181, while export HBASE_MANAGES_ZK=false

- by utrecht

Problem The first aim was to run HBase standalone. Navigating to ip:60010/master-status is succesfull once HBase has been started. The second aim is to run a distinct ZooKeeper quorum. ZooKeeper has been downloaded and has been started: netstat -nato | grep 2181 tcp 0 0 :::2181 :::* LISTEN off (0.00/0/0) The conf/hbase-env.sh was changed as follows: # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=false in order to avoid HBase starts ZooKeeper once HBase has been started. However, the following error occurs once HBase has been started. Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum. Question How to disable the startup of ZooKeeper by HBase and run ZooKeeper separately?

Read the article

Cloudera Manager agent deploy failing to receive heartbeat from agent

- by user150341

All, I am getting the error on the console at the last phase of the installation: Installation failed. Failed to receive heartbeat from agent Server Log: 2012-12-19 00:32:12,132 INFO [NodeConfiguratorThread-4- 0:node.NodeConfiguratorProgress@503] 192.168.1.100: Setting WAIT_FOR_HEARTBEAT as failed and done state All nodes (name node and (2)client nodes) are VM's running 64bit CentOS. sshd has been enabled on all nodes, and VM's are set to Bridge. Any clue on how to fix this error?

Read the article

Is RAID 0 or JBOD better for home media server?

- by Donald Hughes

I have an external two-bay drive enclosure (the OWC Mercury Elite-AL Pro) connected to a Mac Mini (my home media server) over FireWire 800. I'm streaming media to other computers in the house over wired gigabit. I have two 1.5 TB drives that I'm using independently right now. The media is on one, and I'm mirroring the files to the other drive at night as a backup. But as I approach filling up the drive I'm wanting to span those two drives together to give me a total of about 3 TB, and then buy another drive for backups. The external enclosure supports both RAID 0 and JBOD, but I'm not clear on which would be better in this situation. Would RAID 0 provide any performance improvements over JBOD for streaming video (possibly several streams at once? How does each affect the MTBF of the drives? In general, should I choose RAID 0, JBOD, or keep them independent?

Read the article

One codec to rule them all

- by AngryHacker

I am streaming videos in my house via Windows Media Player Streaming, which is basically DLNA. So theoretically any DLNA compliant device can pick up the stream. However, I've quickly found that this is only one part of the solution. Over the years I've accumulated a ton of video-capable devices. While all these devices can see the Windows Media Player stream, they all speak in different codecs. And frankly, I am confused by codecs. In the beginning, I thought that the codecs were defined by the filename extension they carried (e.g. avi, mp4, wmv, etc...), but after further research, it looks like the extensions are simply containers. Inside an .avi file could reside several different codecs. So my question is this: is there a format/codec that plays equally well on any device.

Read the article

Setup Windows Media Player 11 to stream from TVersity

- by snorfys

I've got TVersity installed on a Windows 2003 server box (work had an extra license that they donated to let me install at home to get some practice setting up/administering a domain etc.) I found out that Windows Media Player 11 won't install on Windows 2003, but installed TVersity instead and streaming to my 360 is working great. Problem is that I don't know how to setup streaming to any other PC on the network. All of the PCs have access to the shared network folder, but playing from there doesn't stream and the stutter is pretty bad. Is there a way to setup Windows Media Player 11 or another player to stream from TVersity?

Read the article

Is an Adnroid-based Phone a Suitable MP3 Player for Music Streamed over the Internet?

- by James McFarland

I am considering getting an HTC phone running Android from Verizon Wireless when I next upgrade my phone. I also have an online account with a music vendor, where I have rights to listen to my collection, but not download the MP3s. Further, I have an unlimited data plan and Wi-Fi, so I have full access to bandwidth volume without any concerns. I am especially interested in mounting my phone in a car kit, and streaming my online music to my car's sound system while driving. If you are experienced in this scenario, or have tried this scenario - Is is reasonable to expect my HTC Android phone to provide me with streaming music via my cell data plan anywhere I get cell service?

Read the article

Wireless dropouts that only affect subset of devices

- by jwaddell

When watching videos streamed over WiFi from a NAS box (D-Link DNS-323) I am getting wireless dropouts. However they only appear to occur when I have left my laptop (Dell Inspiron 9300 running Windows XP SP3) running; the laptop is usually suspended if I'm not using it. The dropouts have occurred when streaming to a Netgear EVA8000 streaming device, and also to a PS3. I'm using a Netgear DG834G as the wireless modem/router. When a dropout occurs I go to the laptop and see that its wireless connection has also dropped out. The odd thing is that my wife's MacBook and my iPhone still maintain their connections. What could be causing this behaviour, and how do I go about fixing it?

Read the article

How to stream multiple files on demand in VLC?

- by romkyns

Is there any way at all that I can set up VLC on a server PC in such a way that I can access a list of all my videos from another PC, and pick one to be streamed on demand? I've been pointed at this streaming guide (pdf), but it's pretty useless. For a start, most of the menus in those screenshots don't match the actual current version VLC, and then it sort of assumes you already know what you're doing. So far I managed to figure out how to stream a single file, which I must choose before watching on the server PC - pretty useless if you ask me! The impenetrable "UI" doesn't help either... (P.S. The reason I'm going for streaming rather than the very simple to set up network drive is described in this question)

Read the article

Improving sound quality with remote ESD server

- by cuu508

Hi, I'm investigating low-budget ways to get audio from my PC (Ubuntu) to HiFi without wires. I'm currently testing a setup where Asus WL-500gP wireless router runs ESD daemon and has attached USB soundcard which is then plugged into HiFi. I'm testing playback on PC with mpg123-esd and Spotify under Wine. The sound is there, latency is unexpectedly low, but I also hear occassional clicks and some distortion from time to time. I suppose that's because of the low latency and wireless streaming of uncompressed audio--any packet drops, CPU temporarily being busy etc. will cause clicks in sound output. Is there a way around this problem, increasing latency / buffer size somehow perhaps? Streaming using shoutcast protocol seems to be a way out but I have feeling that would be a complex and brittle setup.

Read the article

Les États-Unis veulent durcir la législation anti-streaming, diffuser une vidéo protégée pourrait devenir passible de prison

Les États-Unis cherchent à renforcer leur législation anti-streaming Diffuser une vidéo protégée sur une plate-forme de streaming pourrait devenir passible de prison Pendant que la loi HADOPI est implémentée en France avec ses notions de riposte graduée et de coupure de connexion Internet en cas de partage sur le réseau P2P, le congrès américain se penche sur un texte de loi proposant une approche bien plus radicale pour renforcer le droit d'auteur. La proposition de loi répondant au nom de "Bill S.978" propose en effet de sanctionner lourdement presque toute diffusion de contenu protégé par le droit d'auteur via une plate-forme de streaming telle que YouTube. Ain...

Read the article

Better to build or buy a compute grid platform?

- by James B

I am looking to do some quite processor-intensive brute force processing for string matching. I have run my prototype in a multi-threaded environment and compared the performance to an implementation using Gridgain with a couple of nodes (also multithreaded). The performance I observed was that my Gridgain implementation performed slower to my multithreaded implementation. It could be the case that there was a flaw in my gridgain implementation, but it was only a prototype, and I thought the results were indicative. So my question is this: What are the advantages of having to learn and then build an implementation for a particular grid platform (hadoop, gridgain, or EC2 if going hosted - other suggestions welcome), when one could fairly easily put together a lightweight compute grid platform with a much shallower learning curve?...i.e. what do we get for free with these cloud/grid platforms that are worth having/tricky to implement? (Please note, I don't have any need for a data grid) Cheers, -James (p.s. Happy to make this community wiki if needbe)

Read the article

Pig: Count number of keys in a map

- by Donald Miner

I'd like to count the number of keys in a map in Pig. I could write a UDF to do this, but I was hoping there would be an easier way. data = LOAD 'hbase://MARS1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'A:*', '-loadKey true -caching=100000') AS (id:bytearray, A_map:map[]); In the code above, I want to basically build a histogram of id and how many items in column family A that key has. In hoping, I tried c = FOREACH data GENERATE id, COUNT(A_map); but that unsurprisingly didn't work. Or, perhaps someone can suggest a better way to do this entirely. If I can't figure this out soon I'll just write a Java MapReduce job or a Pig UDF.

Search Results

Search found 2017 results on 81 pages for 'hadoop streaming'.

Page 22/81 | < Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >

- by Algorist

- by Akshay Bhat

- by portoalet

- by mrhahn

- by Cervo

- by Gitmo

- by biznez

- by Noah

- by angelina

- by xRobot

- by I__

- by Julien Testut

- by michael

- by utrecht

- by user150341

- by Donald Hughes

- by AngryHacker

- by snorfys

- by James McFarland

- by jwaddell

- by romkyns

- by cuu508

- by James B

- by Donald Miner

< Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >