Search Results

Search found 2017 results on 81 pages for 'hadoop streaming'.

Page 27/81 | < Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >

How to host my own cloud so that videos are viewable via desktop web browser?

- by jake9115

I want to host my own cloud storage solution, something like Dropbox but entirely dependent on my own central machine. This way things are more secure if setup correctly, and there are artificial storage limitations or pay-walls. Some thing similar to ownCloud: http://owncloud.org/ There is one important feature I want to have: the ability the stream movies in a web browser from my personal cloud to anywhere in the world. In the past I tried this with a NAS, and I mapped XBMC to the NAS via SFTP, and certain media types could stream in this manner. I've also used things like PLEX. In this case, I am looking for a single solution for personal cloud storage and movie streaming from that cloud into a web browser. Does anyone know if this can be accomplished? Thanks for the suggestions!

Read the article
How to synchronize tasks on multiple computers

- by SysGen

I would like to be able to run similar tasks on several computers that must be precisely synced. More specifically I need 4 laptops to be synced, probably over local network, and I need to use one of them to start a task (play a video) on all of them at the same time (different video files on different laptops). All of them are running Windows. Is there a third-party software or any easier way to do so over LAN without significant delay? I need virtually no response time - if there would be a single video streaming on all laptops then people should not recognize the delay.

Read the article
What video format(s) should be used to serve Macs, PCs, and Mobile Devices?

- by Jeffrey Blake

In 2007, I started a site based on streaming and downloading poker strategy videos. At that point in time, the best solution I came up with for supporting users of Macs and PCs was to provide the videos in both WMV and FLV formats. Later we added an M4V version to support iPhones/iPods. Obviously, things have changed a bit since that time. I would like to revisit our format decision to see if there is anything better that we could offer, preferrably with wider support among all devices (so that we can reduce the number of formats offered, if possible). Is FLV + WMV + M4V the best solution? Is there something else we should consider? What about Android devices?

Read the article
VLC Caching levels

- by Svish

When I open the Preferences of VLC and go to Input & Codecs, I have a setting called Default Caching Level. I can choose between Cusom Lowest latency Low latency Normal High latency Higher latency I'm used to caching being set in seconds or something like that. So, more seconds/higher buffer means less chane of buffer underrun while streaming. What is latency? What does it mean to set it lower or higher? In what cases should I go in what direction? If I'm struggling with buffer underruns, should I set it to lower or higher latency?

Read the article
Crawling engine architecture - Java/ Perl integration

- by Bigtwinz

Hi all, I am looking to develop a management and administration solution around our webcrawling perl scripts. Basically, right now our scripts are saved in SVN and are manually kicked off by SysAdmin/devs etc. Everytime we need to retrieve data from new sources we have to create a ticket with business instructions and goals. As you can imagine, not an optimal solution. There are 3 consistent themes with this system: the retrieval of data has a "conceptual structure" for lack of a better phrase i.e. the retrieval of information follows a particular path we are only looking for very specific information so we dont have to really worry about extensive crawling for awhile (think thousands-tens of thousands of pages vs millions) crawls are url-based instead of site-based. As I enhance this alpha version to a more production-level beta I am looking to add automation and management of the retrieval of data. Additionally our other systems are Java (which I'm more proficient in) and I'd like to compartmentalize the perl aspects so we dont have to lean heavily on outside help. I've evaluated the usual suspects Nutch, Droid etc but the time spent on modifying those frameworks to suit our specific information retrieval cant be justified. So I'd like your thoughts regarding the following architecture. I want to create a solution which use Java as the interface for managing and execution of the perl scripts use Java for configuration and data access stick with perl for retrieval An example use case would be a data analyst delivers us a requirement for crawling perl developer creates the required script and uses this webapp to submit the script (which gets saved to the filesystem) the script gets kicked off from the webapp with specific parameters .... Webapp should be able to create multiple threads of the perl script to initiate multiple crawlers. So questions are what do you think how solid is integration between Java and Perl specifically from calling perl from java has someone used such a system which actually is part perl repository The goal really is to not have a whole bunch of unorganized perl scripts and put some management and organization on our information retrieval. Also, I know I can use perl do do the web part of what we want - but as I mentioned before - trying to keep perl focused. But it seems assbackwards I'm not adverse to making it an all perl solution. Open to any all suggestions and opinions. Thanks

Read the article
I'm familiar with Python and its data structures. Can someone give me a very basic example on how to

- by alex

What can I do with Mapreduce? Dictionaries? Lists? What do I use it for? Give a real easy example

Read the article
Does throwing an exception in an EvalFunc pig UDF skip just that line, or stop completely?

- by Daniel Huckstep

I have a User Defined Function (UDF) written in Java to parse lines in a log file and return information back to pig, so it can do all the processing. It looks something like this: public abstract class Foo extends EvalFunc<Tuple> { public Foo() { super(); } public Tuple exec(Tuple input) throws IOException { try { // do stuff with input } catch (Exception e) { throw WrappedIOException.wrap("Error with line", e); } } } My question is: if it throws the IOException, will it stop completely, or will it return results for the rest of the lines that don't throw an exception? Example: I run this in pig REGISTER myjar.jar DEFINE Extractor com.namespace.Extractor(); logs = LOAD '$IN' USING TextLoader AS (line: chararray); events = FOREACH logs GENERATE FLATTEN(Extractor(line)); With this input: 1.5 7 "Valid Line" 1.3 gghyhtt Inv"alid line"" I throw an exceptioN!! 1.8 10 "Valid Line 2" Will it process the two lines and will 'logs' have 2 tuples, or will it just die in a fire?

Read the article
Unable to run MR on cluster

- by RAVITEJA SATYAVADA

I have an Map reduce program that is running successfully in standalone(Ecllipse) mode but while trying to run the same MR by exporting the jar in cluster. It is showing null pointer exception like this, 13/06/26 05:46:22 ERROR mypackage.HHDriver: Error while configuring run method. java.lang.NullPointerException I double checked the run method parameters those are not null and it is running in standalone mode as well..

Read the article
Is it worth purchasing Mahout in Action to get up to speed with Mahout, or are there other better sources?

- by gab

I'm currently a very casual user of Apache Mahout, and I'm considering purchasing the book Mahout in Action. Unfortunately, I'm having a really hard time getting an idea of how worth it this book is -- and seeing as it's a Manning Early Access Program book (and therefore only currently available as a beta-version e-book), I can't take a look myself in a bookstore. Can anyone recommend this as a good (or less good) guide to getting up to speed with Mahout, and/or other sources that can supplement the Mahout website?

Read the article
Converting python collaborative filtering code to use Map Reduce

- by Neil Kodner

Using Python, I'm computing cosine similarity across items. given event data that represents a purchase (user,item), I have a list of all items 'bought' by my users. Given this input data (user,item) X,1 X,2 Y,1 Y,2 Z,2 Z,3 I build a python dictionary {1: ['X','Y'], 2 : ['X','Y','Z'], 3 : ['Z']} From that dictionary, I generate a bought/not bought matrix, also another dictionary(bnb). {1 : [1,1,0], 2 : [1,1,1], 3 : [0,0,1]} From there, I'm computing similarity between (1,2) by calculating cosine between (1,1,0) and (1,1,1), yielding 0.816496 I'm doing this by: items=[1,2,3] for item in items: for sub in items: if sub >= item: #as to not calculate similarity on the inverse sim = coSim( bnb[item], bnb[sub] ) I think the brute force approach is killing me and it only runs slower as the data gets larger. Using my trusty laptop, this calculation runs for hours when dealing with 8500 users and 3500 items. I'm trying to compute similarity for all items in my dict and it's taking longer than I'd like it to. I think this is a good candidate for MapReduce but I'm having trouble 'thinking' in terms of key/value pairs. Alternatively, is the issue with my approach and not necessarily a candidate for Map Reduce?

Read the article
How can I get it the Free Music Archive audio player or is there a better alternative?

- by Dennis Hodapp

I'm looking at free streaming audio players for web browsers that I can use in a project. I really like the audio player used on http://freemusicarchive.org/. Are they using an open source audio player and can I get a hold of it? Or is it closed source? Also if there are any open-source audio players that anybody knows about I'd love to know about them (preferable to have one with no flash). Last thing...is HTML5 going to be able to replace audio streaming players?

Read the article
Searches (and general querying) with HBase and/or Cassandra (best practices?)

- by alexeypro

I have User model object with quite few fields (properties, if you wish) in it. Say "firstname", "lastname", "city" and "year-of-birth". Each user also gets "unique id". I want to be able to search by them. How do I do that properly? How to do that at all? My understanding (will work for pretty much any key-value storage -- first goes key, then value) u:123456789 = serialized_json_object ("u" as a simple prefix for user's keys, 123456789 is "unique id"). Now, thinking that I want to be able to search by firstname and lastname, I can save in: f:Steve = u:384734807,u:2398248764,u:23276263 f:Alex = u:12324355,u:121324334 so key is "f" - which is prefix for firstnames, and "Steve" is actual firstname. For "u:Steve" we save as value all user id's who are "Steve's". That makes every search very-very easy. Querying by few fields (properties) -- say by firstname (i.e. "Steve") and lastname (i.e. "l:Anything") is still easy - first get list of user ids from "f:Steve", then list from "l:Anything", find crossing user ids, an here you go. Problems (and there are quite a few): Saving, updating, deleting user is a pain. It has to be atomic and consistent operation. Also, if we have size of value limited to some value - then we are in (potential) trouble. And really not of an answer here. Only zipping the list of user ids? Not too cool, though. What id we want to add new field to search by. Eventually. Say by "city". We certainly can do the same way "c:Los Angeles" = ..., "c:Chicago" = ..., but if we didn't foresee all those "search choices" from the very beginning, then we will have to be able to create some night job or something to go by all existing User records and update those "c:CITY" for them... Quite a big job! Problems with locking. User "u:123" updates his name "Alex", and user "u:456" updates his name "Alex". They both have to update "f:Alex" with their id's. That means either we get into overwriting problem, or one update will wait for another (and imaging if there are many of them?!). What's the best way of doing that? Keeping in mind that I want to search by many fields? P.S. Please, the question is about HBase/Cassandra/NoSQL/Key-Value storages. Please please - no advices to use MySQL and "read about" SELECTs; and worry about scaling problems "later". There is a reason why I asked MY question exactly the way I did. :-)

Read the article
CouchDB, HDFS, HBase or which is right for my situation?

- by Lucas

Hello all, This question is regarding data storage systems such as CouchDB, HDFS and HBase, specifically, which is right. I am looking at making a simple and customized Document Management System for my organization. Basically, we need the ability to store some Word Documents, PDFs and other similar files. I also want to store metadata about these files (e.g., Author, Dates, etc). Usage permissions would also be handy, but that can probably be built using meta-data. I would also need the ability to full-text index. The ability to version, while not required would be extremely useful. I would like the ability to simply add hardware to expand the resources of the system and the system must support Network Attached Storage over the CIFS or NFS protocol(s). I have read about CouchDB, HDFS and HBase. My preferred programming language is C# as all of my end-users will be running Windows machines and I will want to make both web and winforms client implementations. My question is which solution best fits my needs? Based on my research it appears that CouchDB (utilizing the CouchDB-Lounge and CouchDB-Lucene) perfectly fits my needs. However, I am worried that since I have worked with CouchDB that I might be overlooking something useful for my needs in HDFS or HBase or something similar due to a bias. Any and all opinions are welcome as I am looking for the community input as I really do not want to make the wrong choice at the start of my project. Please ask if you need more information. I thank you all for your time, input and assistance.

Read the article
what is a data serialization system?

- by Yang

according to Apache AVRO project, "Avro is a serialization system". By saying data serialization system, does it mean that avro is a product or api? also, I am not quit sure about what a data serialization system is? for now, my understanding is that it is a protocol that defines how data object is passed over the network. Can anyone help explain it in an intuitive way that it is easier for people with limited distributed computing background to understand? Thanks in advance!

Read the article
Strange results - I obtain same value for all keys

- by Pietro Luciani

I have a problem with mapreduce. Giving as input a list of song ("Songname"#"UserID"#"boolean") i must have as result a song list in which is specified how many time different useres listen them... so a output ("Songname","timelistening"). I used hashtable to allow only one couple . With short files it works well but when I put as input a list about 1000000 of records it returns me the same value (20) for all records. This is my mapper: public static class CanzoniMapper extends Mapper<Object, Text, Text, IntWritable>{ private IntWritable userID = new IntWritable(0); private Text song = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { /*StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); }*/ String[] caratteri = value.toString().split("#"); if(caratteri[2].equals("1")){ song.set(caratteri[0]); userID.set(Integer.parseInt(caratteri[1])); context.write(song,userID); } } } This is my reducer: public static class CanzoniReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { Hashtable<IntWritable,Text> doppioni = new Hashtable<IntWritable,Text>(); for (IntWritable val : values) { doppioni.put(val,key); } result.set(doppioni.size()); //doppioni.clear(); context.write(key,result); } } and main: Configuration conf = new Configuration(); Job job = new Job(conf, "word count"); job.setJarByClass(Canzoni.class); job.setMapperClass(CanzoniMapper.class); //job.setCombinerClass(CanzoniReducer.class); //job.setNumReduceTasks(2); job.setReducerClass(CanzoniReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); Any idea???

Read the article
PIG doesn't read my custom InputFormat

- by Simon Guo

I have a custom MyInputFormat that suppose to deal with record boundary problem for multi-lined inputs. But when I put the MyInputFormat into my UDF load function. As follow: public class EccUDFLogLoader extends LoadFunc { @Override public InputFormat getInputFormat() { System.out.println("I am in getInputFormat function"); return new MyInputFormat(); } } public class MyInputFormat extends TextInputFormat { public RecordReader createRecordReader(InputSplit inputSplit, JobConf jobConf) throws IOException { System.out.prinln("I am in createRecordReader"); //MyRecordReader suppose to handle record boundary return new MyRecordReader((FileSplit)inputSplit, jobConf); } } For each mapper, it print out I am in getInputFormat function but not I am in createRecordReader. I am wondering if anyone can provide a hint on how to hoop up my costome MyInputFormat to PIG's UDF loader? Much Thanks. I am using PIG on Amazon EMR.

Read the article
how to play rtsp streamming in QT

- by user63898

Hello im trying to find away to play in QT 4.6 rtsp streaming ,that i got from youtube api can it be done somehow?

Read the article
retrieving multiple versions through API through hbase

- by sammy

hello , this is a continuation of my previous question where id used hbase shell.. http://stackoverflow.com/questions/3024417/facing-problems-while-updating-rows-in-hbase i tried the same with API.. im not able to figure out how to retrieve all versions , iterate and print their values for a specific row... i've spending hours reading... please help me out... Scan s = new Scan(Bytes.toBytes("row1")); s.addColumn(Bytes.toBytes("column"),Bytes.toBytes("address")); SETTING RANGE FOR THE VERSIONS s.setTimeRange(0L,6L); ResultScanner scanner = table.getScanner(s); for (Result r : scanner) { for(KeyValue kv : r.sorted()) { System.out.println("To"+kv.getTimestamp()); System.out.println("from "+Bytes.toString(kv.getKey())); System.out.println("To "+Bytes.toString(kv.getValue())); } scanner.close(); } here im intending to print all versions of the column..... but it gives the most recent one... im stuck here...

Read the article
How to show a video stored on server on iphone

- by Amitkumar

Hi, I have a query regarding showing a video (which is stored on server) on iPhone. I want show a video in an iPhone Application. This is not live streaming. So how the video can be shown? I have read the Apple's documentation for HTTP streaming of video. Do I need to call a Web Service? Is there any tutorial for this? Thanks in advance..

Read the article
Reducer getting fewer records than expected

- by sathishs

We have a scenario of generating unique key for every single row in a file. we have a timestamp column but the are multiple rows available for a same timestamp in few scenarios. We decided unique values to be timestamp appended with their respective count as mentioned in the below program. Mapper will just emit the timestamp as key and the entire row as its value, and in reducer the key is generated. Problem is Map outputs about 236 rows, of which only 230 records are fed as an input for reducer which outputs the same 230 records. public class UniqueKeyGenerator extends Configured implements Tool { private static final String SEPERATOR = "\t"; private static final int TIME_INDEX = 10; private static final String COUNT_FORMAT_DIGITS = "%010d"; public static class Map extends Mapper<LongWritable, Text, Text, Text> { @Override protected void map(LongWritable key, Text row, Context context) throws IOException, InterruptedException { String input = row.toString(); String[] vals = input.split(SEPERATOR); if (vals != null && vals.length >= TIME_INDEX) { context.write(new Text(vals[TIME_INDEX - 1]), row); } } } public static class Reduce extends Reducer<Text, Text, NullWritable, Text> { @Override protected void reduce(Text eventTimeKey, Iterable<Text> timeGroupedRows, Context context) throws IOException, InterruptedException { int cnt = 1; final String eventTime = eventTimeKey.toString(); for (Text val : timeGroupedRows) { final String res = SEPERATOR.concat(getDate( Long.valueOf(eventTime)).concat( String.format(COUNT_FORMAT_DIGITS, cnt))); val.append(res.getBytes(), 0, res.length()); cnt++; context.write(NullWritable.get(), val); } } } public static String getDate(long time) { SimpleDateFormat utcSdf = new SimpleDateFormat("yyyyMMddhhmmss"); utcSdf.setTimeZone(TimeZone.getTimeZone("America/Los_Angeles")); return utcSdf.format(new Date(time)); } public int run(String[] args) throws Exception { conf(args); return 0; } public static void main(String[] args) throws Exception { conf(args); } private static void conf(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); Job job = new Job(conf, "uniquekeygen"); job.setJarByClass(UniqueKeyGenerator.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); // job.setNumReduceTasks(400); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } } It is consistent for higher no of lines and the difference is as huge as 208969 records for an input of 20855982 lines. what might be the reason for reduced inputs to reducer?

Read the article
Linux application that bundles multiple incoming audio and video streams into one container file?

- by StackedCrooked

I've been assigned to implement a video on-demand service for a local university. Different aspects of the lectures (video, audio, screen cast, white board) will be recorded. During a lecture all these data streams arrive at one Linux server. This server should transcode and bundle all these streams into one container (Matroska) file. My options seem to be: Write a GStreamer application do something with FFMPEG do something with VLC ...? Has anyone done something similar in the past? Can you recommend something? Edit For those interested, here are a few of my findings: Matroska is not a good format for streaming (it's possible, but it's not its primary intent) For Flash streaming you can use MPEG4 If you want to combine different videos into one video where each subvideo occupies a rectangular portion of the total screen, then this GStreamer script is useful (I found it on this blog post). Desktop capture works fine with VLC

Read the article
HBase as web app backend

- by NathanD

Can anyone advise if it is a good idea to have HBase as primary data source for web-based application? My primary concern is HBase's response time to queries. Is it possible to have sub-second response? edit: more details about the app itself. Amount of data: ~500GB of text data, expect to reach 1TB soon Number of concurrent users using the app: up to 50 The app will be used to present reports about data stored in HBase, like how many times keyword "X" occured in last 24h. For ~80% of requests from that app I will know the exact key, 20% will be scans (I'm looking into HBase schema design related topics to make it run fast)

Read the article
Advanced queries in HBase

- by Teflon Ted

Given the following HBase schema scenario (from the official FAQ)... How would you design an Hbase table for many-to-many association between two entities, for example Student and Course? I would define two tables: Student: student id student data (name, address, ...) courses (use course ids as column qualifiers here) Course: course id course data (name, syllabus, ...) students (use student ids as column qualifiers here) This schema gives you fast access to the queries, show all classes for a student (student table, courses family), or all students for a class (courses table, students family). How would you satisfy the request: "Give me all the students that share at least two courses in common"? Can you build a "query" in HBase that will return that set, or do you have to retrieve all the pertinent data and crunch it yourself in code?

Read the article
How to pick random (small) data samples using Map/Reduce?

- by Andrei Savu

I want to write a map/reduce job to select a number of random samples from a large dataset based on a row level condition. I want to minimize the number of intermediate keys. Pseudocode: for each row if row matches condition put the row.id in the bucket if the bucket is not already large enough Have you done something like this? Is there any well known algorithm? A sample containing sequential rows is also good enough. Thanks.

Read the article
Nutch search always returns 0 results

- by darbour

I have set up nutch 1.0 on a cluster. It has been setup and has successfully crawled, I copied the crawl directory using the dfs -copyToLocal and set the value of searcher.dir in the nutch-site.xml file located in the tomcat directory to point to that directory. Still when I try to search I receive 0 results. Any help would be greatly appreciated.

Read the article

< Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >