voice recognition - Page 5

Training speech recognition software

- by wyatt

A little left field, but I'm trying to train a speech recognition program and the guidelines suggest that I attempt to speak clearly but naturally. I notice, however, that when one speaks naturally each word tends to drift into the next, resulting in a rather ambiguous boundary between the words. One the one hand, speaking in a more stilted manner would seem to aid the computer in recognising the phonemes, but on the other it would tend to make it less likely to understand more natural speech. Anyone knowledgeable in the field out there who can suggest which of the two approaches is more effective? Thanks

Read the article

Speech recognition with Flash or Silverlight

- by Sebastián Grignoli

I'm developing a web user interface to enter some information that is not very complex but needs to be loaded in real time. I think that the application could make use of speech recognition to facilitate the task. Te core of the interface is being built with Javascript and jQuery, but can easily include a flash or silverlight component. I believe that´s probably the way to go... I don't need to recognize everything that the user says, but only a few prerecorded commands. Also, I don't want the user to click on a button to specify the begining and the end of the spoken command. It should be detected live. Is there anything that does this? I would be grateful if anyone tells me about a complete solution, free or commercial, as well as any advice on capturing a sound stream from the mic and process it with flash or sliverlight. Sebastian.-

Read the article

Sound sample recognition library/code

- by Daniel Mošmondor

I don't want sound-to-text software. What I need is the following: I'll record multiple (say 50+) audio streams (recordings of radio stations) from that recordings, I'll mark interesting audio clips - their length ranges from 2 to 60 seconds - there will be few thousands of such audio clips library should be able to find other instances of same audio clips from recorded sound streams confidence factor should be reported to used and additional input provided so the recognition could perform better next time Do you know of such software library? LGPL would be most valuable to me, but I can go for commercial license as well.

Read the article

PCA extended face recognition

- by cMinor

The state of the art says that we can use PCA to perform face recognition. like this, this or this I am working with a project that involves training a classifier to detect a person who is wearing glasess or hats or even a mustache. The purpose of doing this is to detect when a person that has robbed a bank, store, or have commeted some sort of crime(s) (we have their image in a database), enters a certain place ( historically we know these guys have robbed, so we should take care to avoid problems). We came first to have a distributed database with all images of criminals, then I thought to have a layer of them clasifying these criminals using accesories like hats, mustache or anything that hides their face etc... Then, to apply that knowledge to detect when a particular or a suspect person enters a comercial place. ( In practice when someone is going to rob not all the times they are using an accesorie...) What do you think about this idea of doing PCA to first detect principal components of the face and then the components of an accesory. I was thinking that maybe a probabilistic approach is better so we can compute the probability the criminal is the person that entered a place and call the respective authorities.

Read the article

Voice Recognition Google API

- by user2966744

thanks for reading. I'm creating a simple web based drawing app that uses speech recognition. I have created a simple page, the project is on github here: https://github.com/a5hton/speechdraw It has a 16x16 pixel grid. I would like to be able to draw on this grid by using simple words. For example if you say "right", the pixel to the right will be colored black. If you say "down" the pixel below the last one will be colored black. You can say up, down, left or right and the corresponding pixels will be colored. Saying "erase" will switch to erase mode, colouring the pixels back to their original color. Saying "lift" will lift the pen off the page. Saying "draw" will enable the draw mode. Could you please help me work out how to make this happen. Please see the simple page at to get an understanding. Thank you! Cheers, Michael

Read the article

Anyone knows good references for Machine Learning Algorithms and Image Recognition?

- by RaymondBelonia

I need it for my thesis and for some reason I am having a hard time finding decent books or websites for it. My thesis topic is "Classification of Modern Art Paintings using Machine Learning Approach". My goal is to classify examples of modern art paintings to its respective modern art movement(expressionism, realism,etc..) using machine learning approach. Also, suggestions and comments about my thesis are greatly appreciated.

Read the article

How to store generated eigen faces for future face recognition?

- by user3237134

My code works in the following manner: 1.First, it obtains several images from the training set 2.After loading these images, we find the normalized faces,mean face and perform several calculation. 3.Next, we ask for the name of an image we want to recognize 4.We then project the input image into the eigenspace, and based on the difference from the eigenfaces we make a decision. 5.Depending on eigen weight vector for each input image we make clusters using kmeans command. Source code i tried: clear all close all clc % number of images on your training set. M=1200; %Chosen std and mean. %It can be any number that it is close to the std and mean of most of the images. um=60; ustd=32; %read and show images(bmp); S=[]; %img matrix for i=1:M str=strcat(int2str(i),'.jpg'); %concatenates two strings that form the name of the image eval('img=imread(str);'); [irow icol d]=size(img); % get the number of rows (N1) and columns (N2) temp=reshape(permute(img,[2,1,3]),[irow*icol,d]); %creates a (N1*N2)x1 matrix S=[S temp]; %X is a N1*N2xM matrix after finishing the sequence %this is our S end %Here we change the mean and std of all images. We normalize all images. %This is done to reduce the error due to lighting conditions. for i=1:size(S,2) temp=double(S(:,i)); m=mean(temp); st=std(temp); S(:,i)=(temp-m)*ustd/st+um; end %show normalized images for i=1:M str=strcat(int2str(i),'.jpg'); img=reshape(S(:,i),icol,irow); img=img'; end %mean image; m=mean(S,2); %obtains the mean of each row instead of each column tmimg=uint8(m); %converts to unsigned 8-bit integer. Values range from 0 to 255 img=reshape(tmimg,icol,irow); %takes the N1*N2x1 vector and creates a N2xN1 matrix img=img'; %creates a N1xN2 matrix by transposing the image. % Change image for manipulation dbx=[]; % A matrix for i=1:M temp=double(S(:,i)); dbx=[dbx temp]; end %Covariance matrix C=A'A, L=AA' A=dbx'; L=A*A'; % vv are the eigenvector for L % dd are the eigenvalue for both L=dbx'*dbx and C=dbx*dbx'; [vv dd]=eig(L); % Sort and eliminate those whose eigenvalue is zero v=[]; d=[]; for i=1:size(vv,2) if(dd(i,i)>1e-4) v=[v vv(:,i)]; d=[d dd(i,i)]; end end %sort, will return an ascending sequence [B index]=sort(d); ind=zeros(size(index)); dtemp=zeros(size(index)); vtemp=zeros(size(v)); len=length(index); for i=1:len dtemp(i)=B(len+1-i); ind(i)=len+1-index(i); vtemp(:,ind(i))=v(:,i); end d=dtemp; v=vtemp; %Normalization of eigenvectors for i=1:size(v,2) %access each column kk=v(:,i); temp=sqrt(sum(kk.^2)); v(:,i)=v(:,i)./temp; end %Eigenvectors of C matrix u=[]; for i=1:size(v,2) temp=sqrt(d(i)); u=[u (dbx*v(:,i))./temp]; end %Normalization of eigenvectors for i=1:size(u,2) kk=u(:,i); temp=sqrt(sum(kk.^2)); u(:,i)=u(:,i)./temp; end % show eigenfaces; for i=1:size(u,2) img=reshape(u(:,i),icol,irow); img=img'; img=histeq(img,255); end % Find the weight of each face in the training set. omega = []; for h=1:size(dbx,2) WW=[]; for i=1:size(u,2) t = u(:,i)'; WeightOfImage = dot(t,dbx(:,h)'); WW = [WW; WeightOfImage]; end omega = [omega WW]; end % Acquire new image % Note: the input image must have a bmp or jpg extension. % It should have the same size as the ones in your training set. % It should be placed on your desktop ed_min=[]; srcFiles = dir('G:\newdatabase\*.jpg'); % the folder in which ur images exists for b = 1 : length(srcFiles) filename = strcat('G:\newdatabase\',srcFiles(b).name); Imgdata = imread(filename); InputImage=Imgdata; InImage=reshape(permute((double(InputImage)),[2,1,3]),[irow*icol,1]); temp=InImage; me=mean(temp); st=std(temp); temp=(temp-me)*ustd/st+um; NormImage = temp; Difference = temp-m; p = []; aa=size(u,2); for i = 1:aa pare = dot(NormImage,u(:,i)); p = [p; pare]; end InImWeight = []; for i=1:size(u,2) t = u(:,i)'; WeightOfInputImage = dot(t,Difference'); InImWeight = [InImWeight; WeightOfInputImage]; end noe=numel(InImWeight); % Find Euclidean distance e=[]; for i=1:size(omega,2) q = omega(:,i); DiffWeight = InImWeight-q; mag = norm(DiffWeight); e = [e mag]; end ed_min=[ed_min MinimumValue]; theta=6.0e+03; %disp(e) z(b,:)=InImWeight; end IDX = kmeans(z,5); clustercount=accumarray(IDX, ones(size(IDX))); disp(clustercount); QUESTIONS: 1.It is working fine for M=50(i.e Training set contains 50 images) but not for M=1200(i.e Training set contains 1200 images).It is not showing any error.There is no output.I waited for 10 min still there is no output. I think it is going infinite loop.What is the problem?Where i was wrong? 2.Instead of running the training set everytime how eigen faces generated are stored so that stored eigen faces are used for future face recoginition for a new input image.So it reduces wastage of time.

Read the article

Is it possible to get a google voice number without already having a phone number?

- by boost

I'm sorry if this is the wrong place to post this, but I couldn't find a more suitable website in the stackexchange network. I currently have no phone number; I can make outgoing calls via the widget in gmail, but I cannot receive calls (as far as I know). I know that you can set up a google voice number to forward to google chat, and this is exactly what I want. The problem is, I can't get a google voice number in the first place, because it first requires that I have an existing phone number, even if I wouldn't use it. So, it is possible to skip providing an existing phone number, and just get a google voice number that forwards to google chat? Alternatively, is there any free phone service that can be used without a phone and lets you receive calls from any number? I realize I'm kind of asking for free candy, but, if it's out there, I'd be a fool

Read the article

Windows Phone 7 Prototype 001: Speech Recognition on WP7

At some point in the future it will be awesome when you can just tell your computer what to do and it does it - without typing to help those of us with a blistering 11 WPM hunk and peck technique. Siri, a mobile digital assistant using speech recognition was voted best tech at SXSW. I dont know about that one. Although, I'm sure it will get better when Apple rebuilds it and bundles on iPhone 5. So how would you do that on WP7? There have been some videos floating around showing Bing with some voice control so obviously the phone has speech recognition. So what options are there: System.Speech? Not included in WP7/SL Nuance software like Siri? No WP7/SL version yet. Invoking the SAPI dlls on the phone? No automation factory in WP7 SL. Web services using System.Speech and mic on the phone? YES! The last one was my least favorite but that works for now. I built a quick sample app to show how to do text-to-speech and speech recognition on WP7. @eklimczak will not be happy with the developer designed UI. In this sample there is web service with provides access to the system.speech APIs in .NET. Basically its just passing around byte arrays. On the phone its using the XNA audio frameworks to play the text-to-speech stream and to record using the microphone. The code is pretty simple and you can download from the link at the end of this post. The only things to note are adjusting the WCF config to handle larger byte uploads and the Microphone API is a little weird with that 1 second buffer. It would be nice if you could just to mic.start and mic.end which would return an array of bytes instead of managing your own stream inside the buffer ready callback. Couple of downsides to this approach: Recoding from the phone has some static. Could be my code or the my mic is bad / not calibrated right. Having to make web service calls instead of local access is not ideal (Microsoft, please add an API for the SAPI dlls) Although in the context of an app like Siri its not so bad since you need to do web service lookups to get data back Speech recognition quality really depends on either a) a limited grammar set like that pizza grammar in the sample or b) training the recognizer. For the latter it would be annoying to have users train the system. Using the System.Speech stuff youd have to have a profile for each user. So until Microsoft adds some speech client APIs on the phone or Nuance releases a wp7 product, this is a decent workaround. In the future Id like to build something similar to Siri. I shall call it Iris in homage. Im a big fan of mobile speech apps because frankly its just not safe to Google while driving. Since some of my designer co-workers have been posting UI sketches for WP7, Id like to start posting some code prototypes for things I try out on the phone. That will probably last 2 weeks, but for the moment I have like 10 posts in the queue. Sample Code 100% guaranteed to work on my emulatorDid you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article

Face morph and recognition

- by startuper

I have two requirements: members of a social network choose other member's faces and morph an average face of them. The website finds other members' faces that resemble the morphed face and list up in order of resemblance. Is there a script that can do this? I see that http://www.faceresearch.org/demos/average does the item 1 but they don't license their technology. Please help. Thank you in advance.

Read the article

Any simple shape recognition libraries for Java?

- by Phil

I am working on a on-screen keyboard for Android, and I need to recognize starting points, turning points and end points of lines drawn by the user on the keyboard. A simple straightening function would be nice, as it is difficult to draw a perfectly straight line even with a stylus, not to mention finger-only touchscreens today. What I am trying to write is something like Swype. Any good libraries that I can use or make reference to?

Read the article

Algorithm for voice comparison

- by Horace Ho

Given two recorded voices in digital format, is there an algorithm to compare the two and return a coefficient of similarity?

Read the article

Online voice chat: Why client-server model vs. peer-to-peer model?

- by sstallings

I am adding online voice chat to a Silverlight app. I've been reviewing current apps, services and SDKs found thru online searches and forums. I'm finding that the majority of these implement a client-server (C/S) model and I'm trying to understand why that model versus a peer-to-peer (PTP) model. To me PTP would be preferable because going direct between peers would be more efficient (fewer IP hops and no processing along the way by a server computer) and no need for a server and its costs and dependencies. I found some products offer the ability to switch from PTP to C/S if the PTP proves insufficient. As I thought more about it, I could see that C/S could be better if there are more than two peers involved in a conversation, then the server (supposedly with more bandwidth) could do a better job of relaying each peers outgoing traffic to the multiple other peers. In C/S many-to-many voice chatting, each peer's upstream broadband (which is where the bottleneck inherently is) would only have to carry each item of voice traffic once, then the server would use its superior bandwidth to relay the message to the multiple other peers. But, in a situation with one-on-one voice chatting it seems that PTP would be best. A server would not reduce each of the two peer's bandwidth requirements and would only add unnecessary overhead, dependency and cost. In one-on-one voice chatting: Am I mistaken on anything above? Would peer-to-peer be best? Would a server provide anything of value that could not be provided by a client-only program? Is there anything else that I should be taking into consideration? And lastly, can you recommend any Silverlight PTP or C/S voice chat products? Thanks in advance for any info.

Read the article

Looking for voice command code/library for obj-c/c

- by Smallinov

C/Obj-C Developers- I am looking for a way to listen for a specific word or small set of words to be spoken and fire an event. Needs to be able to run on the iPhone/iPad ... Looking for ideas?

Read the article

Text and Picture Block Recognition in Picture

- by Murat

Hi, I'm new in Image Proccessing. I have an image and I want to parsing Text blocks and picture blocks. How can I do? I don't use OCR components. I need algorithm or sample or suggestion.

Read the article

Speech Recognition Server Does Not Stay Open

- by Waffle

I am trying to create a simple program that loops for user speech input using com.apple.speech.recognitionserver. My code thus far is as follows: set user_response to "start" repeat while user_response is not equal to "Exit" tell application id "com.apple.speech.recognitionserver" set user_response to listen for {"Time", "Weather", "Exit"} with prompt "Good Morning" end tell if user_response = "Time" then set curr_time to time string of (the current date) set curr_day to weekday of (the current date) say "It is" say curr_time say "on" say curr_day say "day" else if user_response = "Weather" then say "It is hot outside. What do you expect?" end if end repeat say "Have a good day" If the above is run on my system it says good morning and it then pops up with the speech input system and waits for either Time, Weather, or Exit. They all do what they say they are going to do, but instead of looping if I say Time and Weather and asking again until I say exit the speechserver times out and never pops up again. Is there a way of either keeping that application open until the program ends or is applescript not capable of looping for user speech input?

Read the article

.NET Speech recognition plugin Runtime Error: Unhandled Exception. What could possibly cause it?

- by manuel

I'm writing a plugin (dll file) for speech recognition, and I'm creating a WinForm as its interface/dialog. When I run the plugin and click the 'Speak' to start the initialization, I get an unhandled exception. Here is a piece of the code: public ref class Dialog : public System::Windows::Forms::Form { public: SpeechRecognitionEngine^ sre; private: System::Void btnSpeak_Click(System::Object^ sender, System::EventArgs^ e) { Initialize(); } protected: void Initialize() { if (System::Threading::Thread::CurrentThread->GetApartmentState() != System::Threading::ApartmentState::STA) { throw gcnew InvalidOperationException("UI thread required"); } //create the recognition engine sre = gcnew SpeechRecognitionEngine(); //set our recognition engine to use the default audio device sre->SetInputToDefaultAudioDevice(); //create a new GrammarBuilder to specify which commands we want to use GrammarBuilder^ grammarBuilder = gcnew GrammarBuilder(); //append all the choices we want for commands. //we want to be able to move, stop, quit the game, and check for the cake. grammarBuilder->Append(gcnew Choices("play", "stop")); //create the Grammar from th GrammarBuilder Grammar^ customGrammar = gcnew Grammar(grammarBuilder); //unload any grammars from the recognition engine sre->UnloadAllGrammars(); //load our new Grammar sre->LoadGrammar(customGrammar); //add an event handler so we get events whenever the engine recognizes spoken commands sre->SpeechRecognized += gcnew EventHandler<SpeechRecognizedEventArgs^> (this, &Dialog::sre_SpeechRecognized); //set the recognition engine to keep running after recognizing a command. //if we had used RecognizeMode.Single, the engine would quite listening after //the first recognized command. sre->RecognizeAsync(RecognizeMode::Multiple); //this->init(); } void sre_SpeechRecognized(Object^ sender, SpeechRecognizedEventArgs^ e) { //simple check to see what the result of the recognition was if (e->Result->Text == "play") { MessageBox(plugin.hwndParent, L"play", 0, 0); } if (e->Result->Text == "stop") { MessageBox(plugin.hwndParent, L"stop", 0, 0); } } };

Read the article

Voice Stress Analysis in Python

- by mudder

I'm interested in playing with this, anyone know of any code for this already existing? or what libraries would be useful to get started on something like this?

Read the article

Speech Recognition for Julius using audio instead of Microphone

- by iamrohitbanga

I need to test Julius Speech to Text conversion with some audio. moreover it would be possible to simulate noise over the audio. is anyone aware of such a software? Has anyone worked with Julius? Any Comments on the library?

Read the article

Speech recognition grammar using Delphi

- by XBasic3000

I need help to make a SpeechRecoGrammar without using xml format. Like creating it on runtime on delphi. heres the outcome look like. <GRAMMAR LANGID="409"> i am OK roche edwin allan

Read the article

Identifying voice as male or female

- by duder

I'm not much into audio engineering, so please be easy on me. I'm receiving an audio file as input, and need to detect whether the speaker is male or female. Any ideas how to go about doing this? I'm using php, but am open to using other languages, and don't mind learning a little bit of sound theory as long as the time is proportionate to the task.

Read the article

Voice on 4G Technologies such as LTE and WiMAX?

- by Vaibhav Bajpai

I understand that LTE and WiMAX are IP-based technologies that do NOT have a voice component unlike the current 3G technologies. So is it like, voice calls in 4G would be completely driven on top of IP? Wouldn't this break backward compatibility with existing 3G technologies? Is this why 4G is taking it took long to take ubiquitous availability?

Read the article

Which Messenger service allows you to make voice calls with the least amount of bandwidth ?

- by Rogue

I'll be going to a country side for a few days and the speeds there are pathetic! Which voice messenger uses the least amount of bandwidth in order to make voice calls to other pc's. Speeds are a little better than dial-up nearly 80-100 kbps

Read the article

How to make HTML5 speech recognition not ask permission every time

- by user2081044

I have created a script that requires my microphone. It uses the HTML5 speech recognition API. Chrome asks permission every time I want to perform a speech recognition test. Javascript (partial) code that I am using: var recognition = new webkitSpeechRecognition(); recognition.continuous = true; recognition.interimResults = true; recognition.onresult = function(event) { console.log(event.results[0][0].transcript); if(event.results[0][0].transcript === 'print') { console.log(''); } }; recognition.start(); I have tried to add it into the list of exceptions in either Chrome and Flash player, but it still asks for permission. Printscreen: That message pops up everytime I click the button. Is there any way to disable Chrome for asking permission?

Read the article

Speech Recognition

- by DesigningCode

Today I was asked to write a wee application for someone so that they could turn pages on their ebooks without having to reach for their keyboard or mouse… that way they could do craft or knit or whatever they are doing while they are reading. I vaguely remember that windows has something built in, but have never really played with it before. I have in the past turned on the screen reader and impressed my kids by making the computer saying “amusing” phrases along the lines of “Zac has a smelly bum”. So instead of firing up Visual Studio and getting stuck into the juciy task of writing a speech recognition program…. I typed “speech recognition” into the start menu of my windows 7 computer. And wow! I’ve been playing with it for the last 40 minutes or so and have been most impressed. Dictation wise it certainly misses stuff or gets the wrong words, but I did the training and it certainly improved. But what I’m enjoying is controlling windows. for instance, to start this blog entry I said “Open Writer” and it worked no problem. In fact after I muddled my way through getting going with speech recognition I enjoyed saying “Open notepad” … “close” over and over again. It allows you to click anywhere on the screen, just say “mousegrid” and a 1-9 numbered grid comes up, say a number and it puts a smaller 1-9 numbered grid, and you hone in, till the middle square is on a place you want to click, then you say “click” or “double click”. if you want to enter a key, say “Press Tab” for example. inside programs it understands menu entries. In fact, while writing this I just said “File” “Save” and it happily saved. I think I will play around with this for a while more and try it out in visual studio. Might be quite good for being able to do menu entries instead of grabbing for my mouse…. can keep my hands on the keyboard. ok, wasn’t the first post I wanted to do on geeks with blogs! but hey… will do some techy posts soon.

Search Results

Search found 1303 results on 53 pages for 'voice recognition'.

Page 5/53 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by wyatt

- by Sebastián Grignoli

- by Daniel Mošmondor

- by cMinor

- by user2966744

- by RaymondBelonia

- by user3237134

- by boost

- by startuper

- by Phil

- by Horace Ho

- by sstallings

- by Smallinov

- by Murat

- by Waffle

- by manuel

- by mudder

- by iamrohitbanga

- by XBasic3000

- by duder

- by Vaibhav Bajpai

- by Rogue

- by user2081044

- by DesigningCode

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >