Search Results

Search found 916 results on 37 pages for 'speech recognition'.

Page 27/37 | < Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34  | Next Page >

  • Detecting syllables in a word

    - by user50705
    I need to find a fairly efficient way to detect syllables in a word. E.g., invisible - in-vi-sib-le There are some syllabification rules that could be used: V CV VC CVC CCV CCCV CVCC *where V is a vowel and C is a consonant. e.g., pronunciation (5 Pro-nun-ci-a-tion; CV-CVC-CV-V-CVC) I've tried few methods, among which were using regex (which helps only if you want to count syllables) or hard coded rule definition (a brute force approach which proves to be very inefficient) and finally using a finite state automata (which did not result with anything useful). The purpose of my application is to create a dictionary of all syllables in a given language. This dictionary will later be used for spell checking applications (using Bayesian classifiers) and text to speech synthesis. I would appreciate if one could give me tips on an alternate way to solve this problem besides my previous approaches. I work in Java, but any tip in C/C++, C#, Python, Perl... would work for me.

    Read the article

  • added TextToSpeech to my activity and now my onDestroy is not called any more, bug?

    - by hermo
    I added TextToSpeech to my app, following the guidelines in the following post: http://android-developers.blogspot.com/2009/09/introduction-to-text-to-speech-in.html and now my onDestroy is no longer called when the back button is pressed. I filed a bug report regarding this: http://code.google.com/p/android/issues/detail?id=7674 Figured i should also ask here if someone else has seen this, and found a solution? It seems that it is the intent that causes the problem, i.e. the following: Intent checkIntent = new Intent(); checkIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA); startActivityForResult(checkIntent, MY_DATA_CHECK_CODE); If I skip this intent, and just go ahead and create a tts-instance, it works fine. Any clues to what is wrong with this intent?

    Read the article

  • Alternative to Microsoft Agent / Fix for color issue?

    - by Rob P.
    I've got an app that does Text-To-Speech; but I wanted to show an animated face/character to go with it. I found a tutorial on Microsoft Agent and I implemented it in my vb.net app. The problem is with the transparency color. Unless I run application in compatibility mode/256 colors, the characters will appear with a purplish-pink background image instead of a transparent back-color. But running the app in 256 colors the rest of the app looks awfully out of place. First - is there something that works similar to MS Agent I can use that would be more appropriate? Second - if I'm still MS Agent - can I get the transparent color to work correctly without limiting myself to 256 colors?

    Read the article

  • How to get the default audio format of a TTS Engine

    - by Itslava
    In Microsoft TTS 5.1 or newer. The SpVoice.AudioOutputStream property says: The AudioOutputStream property gets and sets the current audio stream object used by the voice. Setting the voice's AudioOutputStream property may cause its audio output format to be automatically changed to match the text-to-speech (TTS) engine's preferred audio output format. If the voice's AllowAudioOutputFormatChangesOnNextSet property is True, the format change takes place; if False, the format remains unchanged. In order to set the AudioOutputStream property of a voice to a specific format, its AllowOutputFormatChangesOnNextSet should be False. It means a engine's always has a preferred audio output format. So, how can i get it.. i have not found any interface to get that attribute.

    Read the article

  • How to get the contents of the wav file into array so as to cut the required segment and convert it

    - by kaushik
    How to get the contents of the wav file into array so as to cut the required segment and convert it back to wav format using python?? My prob is similar to "ROMANs" prob,i hav seen earlier in the post at this site.. Basically,i want to combine parts of different wav file into one wav file?? if there is ne other apporach thn takin the contents into an array and cuting part and combining and again converting bac? please suggest... edited: I prefer unpacking the contents of the wave file into an array and editing by cutting the required segment of sound from the wav file,as i am working on speech processing,and guess this way would be easy to enchance the quality of sound later... can ne one suggest a way for this?? Plz help.. Thanks in advance.

    Read the article

  • Embed font in a mac bundle

    - by RW
    I have a program I am writing. I want to use a fancy font. Can I just embed my font into my bundle and use it from there. My code... NSMutableAttributedString *recOf; recOf = [[NSMutableAttributedString alloc] initWithString:@"In Recognition of"]; length = [recOf length]; [recOf addAttribute:NSFontAttributeName value:[NSFont fontWithName:@"Edwardian Script ITC" size:50] range:NSMakeRange(0, length)]; [[NSColor blackColor] set]; p.x = (bounds.size.width/2)- (([recOf size].width)/2); p.y = (bounds.size.height/1.7); [recOf drawAtPoint:p]; [recOf release];

    Read the article

  • Filter user input (paragraph) for links + smileys

    - by Alec Smart
    Hello, I am looking at some sort of existing filter which can sanitize the user input to avoid XSS. Probably I can use htmlspecialchars for that. But at the same time I want to be able to parse all links (should match a.com, www.a.com and http://www.a.com and if it is http://www.aaaaaaaaaaaaaaaaaaaaaaaaaa.com then it should display it as aaa..a.com), e-mails and smileys. I am wondering what is the best way to go about it. I am currently using a php function with some regex, but many times the regex simply fails (because of link recognition is incorrect etc.). I want something very similar to the parser used during Google Chat (even a.com works). Thank you for your time.

    Read the article

  • Linux, how to capture screen, and simulate mouse movements.

    - by 2di
    Hi All I need to capture screen (as print screen) in the way so I can access pixel color data, to do some image recognition, after that I will need to generate mouse events on the screen such as left click, drag and drop (moving mouse while button is pressed, and then release it). Once its done, image will be deleted. Note: I need to capture whole screen everything that user can see, and I need to simulate clicks outside window of my program (if it makes any difference) Spec: Linux ubuntu Language: C++ Performance is not very important,"print screen" function will be executed once every ~10 sec. Duration of the process can be up to 24 hours so method needs to be stable and memory leaks free (as usuall :) I was able to do in windows with win GDI and some windows events, but I'ev no idea how to do it in Linux. Thanks a lot

    Read the article

  • Best programming novel to take on holiday

    - by Ed Guiness
    I am about enjoy a two week break in Spain where I expect to have lots of time for relaxing and reading. I normally read a lot of non-fiction so I'm looking for novel suggestions. If there is another Cryptonomicon out there I'd love to hear about it! UPDATE: In the end I took four books including Quicksilver. Quicksilver was fantastic and I look forward to continuing the series. I was disappointed with Gen X (Coupland) and Pattern Recognition (Gibson). Upon arrival I also found The Monsters Of Gramercy Park (Leigh) which was enjoyable though sad. Thanks for all the recommendations, I'm sure to return to this list when I have more free time.

    Read the article

  • Praat scripting

    - by Binaryrespawn
    Hi all, I am trying to write a praat script to do preprocessing on hundreds of speach samples. I need to extract speech features from each sample and feed these as imputs into a feed-forward neural network. I have already constructed the network using math-lab. However, learing to script in praat is proving to be quite a challenge given my time constraints. Some of my samples are 0.01 to 0.03 seconds in length, I was looking at standardising the duration for all samples using Pitch Synchronous OverLap-Add(PSOLA). However this will be very tedious if I were to do this for every sample. Is there any script that can read in all of my files and perform the operations in a batch mode? Any guidance will be surelly appreaciated. Regards.

    Read the article

  • Key word extraction in Python

    - by oliland
    I'm building a website in django that needs to extract key words from short (twitter-like) messages. I've looked at packages like topia.textextract and nltk - but both seem to be overkill for what I need to do. All I need to do is filter words like "and", "or", "not" while keeping nouns and verbs that aren't conjunctives or other parts of speech. Are there any "simpler" packages out there that can do this? EDIT: This needs to be done in near real-time on a production website, so using a keyword extraction service seems out of the question, based on their response times and request throttling.

    Read the article

  • Playing audio from a wav file in iPhone SpeakHere example

    - by Mo
    I'm working with the iPhone SpeakHere example, and I would like to be able to play audio from either the mic (as in the example) or from a wav file. I have working code to play from a particular wav file, which looks like this: NSString *path = [[NSBundle mainBundle] pathForResource:@"basketBall" ofType:@"wav"]; AVAudioPlayer* theAudio=[[AVAudioPlayer alloc] initWithContentsOfURL:[NSURL fileURLWithPath:path] error:NULL]; theAudio.delegate = self; [theAudio play]; So I'm fine with actually getting the wav to play in the application (I can hook it up to a button, etc.) but I would like it to also behave the same way pushing the "Play" button does after recorded speech, in that it should be connected to the same visualization (which I have modified quite a bit, but essentially shows the current volume, among other things). Thanks for your help!

    Read the article

  • How can I tag words when creating grammar rules to convert voice to text using xml ?

    - by jhone
    hii friends, I am doing project using c#, which is about converting voice to text. I use speech sdk for this. I want to tag words according to its category using a grammar file which is written in xml sheet and display it in a text box. eg : if the word is "eat" it should be display like "eat/verb". following is the xml code i have written but the part which i tagged won't display in the text box. only the converted word is there. <rule id="Verbs"> <item>eat<tag>$._value="/verb";</tag></item> </rule>

    Read the article

  • Show Alertdialog and use vibrator

    - by user1007522
    I'm having a class that implements RecognitionListener like this: public class listener implements RecognitionListener I wanted to show a alertdialog and use the vibrator but this isn't possible because I need to provide a context what I don't have. My alertdialog code was like this: new AlertDialog.Builder(this) .setTitle("dd") .setMessage("aa") .setNeutralButton("Ok", new DialogInterface.OnClickListener() { public void onClick(DialogInterface dialog, int which) { } }) .show(); But the AlertDialog.Builder(this) wants a context, the same problem with my vibrator code: v = (Vibrator) getSystemService(Context.VIBRATOR_SERVICE); The getSystemService method isn't available. My code that starts the class: sr = SpeechRecognizer.createSpeechRecognizer(this); sr.setRecognitionListener(new listener()); Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,"voice.recognition.test"); intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS,5); sr.startListening(intent); Whats the best way to solve this?

    Read the article

  • diff implementation in Java

    - by Frór
    Hi, I'm looking for a diff implementation in Java. I've seen that Python has its own SequenceMatcher (with difflib), which is exactly what I need... in Java. Is there any portage? Or is there any other class/library that performs the same in Java? If not, where can I find the source code of that difflib (if free as in speech) to make my own implementation of SequenceMatcher in Java ? Unfortunately, Apache Commons Lang doesn't help me much. Thanks!

    Read the article

  • audio power on AudioQueue

    - by Tomoyuki
    Hi everyone. I'm now creating an Application using speech recognition.To check the Audio Power coming in through the microphone, I wrote a method as follows. -(void)checkPower(AudioqueRef)queue{ UInt32 expectedSize= sizeof(AudioQueueLevelMeterState); AudioQueueGetProperty(queue, kAudioQueueProperty_CurrentLevelMeter, audioLevels, expectedSize); NSLog(@"average:%f peak:%f",audioLevels.mAveragePower,audioLevels.mPeakPower); } I found that sometimes mAveragePower was larger than mPeakPower, and when mAveragePower was 1.0, in other words, averagePower is regarded as max, mPeakPower was lower than 1.0. I think that generally this result is inpossible. please Let me know if you have any information about sound power on CoreAudio. thanks.

    Read the article

  • Bitmap manipulation in C++ on Windows

    - by Oliver
    Hi, I have myself a handle to a bitmap, in C++, on Windows: HBITMAP hBitmap; On this image I want to do some Image Recognition, pattern analysis, that sort of thing. In my studies at University, I have done this in Matlab, it is quite easy to get at the individual pixels based on their position, but I have no idea how to do this in C++ under Windows - I haven't really been able to understand what I have read so far. I have seen some references to a nice looking Bitmap class that lets you setPixel() and getPixel() and that sort of thing, but I think this is with .net . How should I go about turning my HBITMAP into something I can play with easily? I need to be able to get at the RGBA information. Are there libraries that allow me to work with the data without having to learn about DCs and BitBlt and that sort of thing?

    Read the article

  • How do I implement .net plugins without using AppDomains?

    - by Abtin Forouzandeh
    Problem statement: Implement a plug-in system that allows the associated assemblies to be overwritten (avoid file locking). In .Net, specific assemblies may not be unloaded, only entire AppDomains may be unloaded. I'm posting this because when I was trying to solve the problem, every solution made reference to using multiple AppDomains. Multiple AppDomains are very hard to implement correctly, even when architected at the start of a project. Also, AppDomains didn't work for me because I needed to transfer Type across domains as a setting for Speech Server worfklow's InvokeWorkflow activity. Unfortunately, sending a type across domains causes the assembly to be injected into the local AppDomain. Also, this is relevant to IIS. IIS has a Shadow Copy setting that allows an executing assembly to be overwritten while its loaded into memory. The problem is that (at least under XP, didnt test on production 2003 servers) when you programmatically load an assembly, the shadow copy doesnt work (because you are loading the DLL, not IIS).

    Read the article

  • Build a decision tree for classification of large amount data,using python?

    - by kaushik
    Hi,i am working for speech synthesis.In this i have a large number of pronunciation for each phone i.e alphabet and need to classify them according to few feature such as segment size(int) and alphabet itself(string) into a smaller set suitable for that particular context. For this purpose,i have decided to use decision tree for classification.the data to be parsed is in the S expression format.eg:((question)(LEFTNODE)(RIGHTNODE)). i hav idea for building decision tree for normal buit in type such as list..looking for suggestion for implementation for S expression.. kindly help.. Thanks in advance.. Note:this question may look similar to my prev post,srry if cant giv multiple post.already edited it many times so though of wirting new question instead of editing again

    Read the article

  • Storing SQL queries in Table in sql server

    - by Rohit
    We have multiple jobs in our system.These jobs are listed in a grid. We have 3 different user types (usertypeid 1,2,3). For each user listing is different and he can filter listing by selecting view from a dropdown. ViewName in the below table is the view which needs to be displayed. To achieve this functionality, a fellow developer has created the following table structure and stored sql fragments in SQLExpression in the below table. According to me the query should not be stored in database. What are the pros and cons of this approach and what are the available alternatives? JobListingViewID ViewName SQLExpression UserTypeID 3 All Jobs 1 = 1 3 4 Error Jobs JobStatusID IN ( 2 ) 1 5 Error Jobs JobStatusID IN ( 2 ) 2 6 Error Jobs JobStatusID IN ( 2 ) 3 7 Speech JobStatusID IN ( 1, 3, 8 ) 1

    Read the article

  • How do I construct a 3D model of a room from 2 stereo cameras? What is the determining factor to an

    - by yasumi
    Currently, I have extracted depth points to construct a 3D model from 2 stereo cameras. The methods I have used are openCV graphCut method and a software from http://sourceforge.net/projects/reconststereo/. However, the generated 3D models are not very accurate, which leads me to question: 1) What is the problem with pixel-based method? 2) Should I change my pixel-based method to feature-based or object-recognition-based method? Is there a best method? 3) Are there any other ways to do such reconstruction? Additionally, the depth extracted comes only from 2 images. What if I am turning the camera 360 degrees to obtain a video? Looking forward to suggestion on how to combine this depth information. Thank you very much :)

    Read the article

  • Receiving Text From Another Application

    - by Garry
    Hi, I'm building some home automation software with Cocoa/Objective-C. The main application will have a minimal GUI and will most likely be represented by a status bar icon only. I'm using proprietary speech-to-text software (MacSpeech Dictate) that takes my voice command and converts it to plain text. I then need to send this plain text to my app for parsing. Is there a way to send a string to a Cocoa application? Could AppleScript achieve this? How would I make the NSString string in my app "available" to receive the passed string? For reasons that are beyond the scope of this question - it is not possible to dictate the command directly into my app. Many thanks in advance,

    Read the article

  • Any experience with the Deliverance system ?

    - by e-satis
    My new boss went to a speech where Deliverance, a kind of proxy allowing to add skin to any html output on the fly, was presented. He decided to use it right after that, no matter how young it is. More here : http://www.openplans.org/projects/deliverance/introduction In theory, the system sounds great when you want a newbie to tweak your plone theme without having to teach him all the complex mechanisms behind the zope products. And apply the same theme on a Drupal web site in one row. But I don't believe in theory, and would like to know if anybody tried this out in the real world :-)

    Read the article

  • CAD/CAM without C++

    - by zaladane
    Hello, Is it possible to do CAD/CAM software without having to use C++? My company developed their software with c/C++ but that was more than 10 years ago. Today,there is a lot of legacy code that switching would force us to get rid of but i was wondering what the actual risks are. We have a lot of mathematical algorithms for toolpath calculations, feature recognition and simulation and 3D Rendering and i was wondering if C# can handles all of that without great performance loss. Is it a utopia to rewrite such algorithms in c# or should that language only deal with UI. We are not talking about game development here (Halo 3 or Call Of Duty) so how much processing does CAD/CAM really need? Can anybody enlighten me on this matter? Most of my colleagues are hardcore C++ programmers and although i program in c++ i love .NET but i am having a hard time selling .NET to them other than basic UI. Does it make sense to consider switching to .NET in such a field, or is it just not a wise idea? Thank you

    Read the article

  • Distance between hyperplanes

    - by michael dillard
    I'm trying to teach myself some machine learning, and have been using the MNIST database (http://yann.lecun.com/exdb/mnist/) do so. The author of that site wrote a paper in '98 on all different kinds of handwriting recognition techniques, available at http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf. The 10th method mentioned is a "Tangent Distance Classifier". The idea being that if you place each image in a (NxM)-dimensional vector space, you can compute the distance between two images as the distance between the hyperplanes formed by each where the hyperplane is given by taking the point, and rotating the image, rescaling the image, translating the image, etc. I can't figure out enough to fill in the missing details. I understand that most of these are indeed linear operators, so how does one use that fact to then create the hyperplane? And once we have a hyperplane, how do we take its distance with other hyperplanes?

    Read the article

< Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34  | Next Page >