Windows Phone 7 Prototype 001: Speech Recognition on WP7

Posted on Dot net Slackers See other posts from Dot net Slackers
Published on Wed, 12 May 2010 00:00:00 GMT Indexed on 2010/05/12 6:44 UTC
Read the original article Hit count: 1101

Filed under:

At some point in the future it will be awesome when you can just tell your computer what to do and it does it - without typing to help those of us with a blistering 11 WPM hunk and peck technique. Siri, a mobile digital assistant using speech recognition was voted best tech at SXSW. I dont know about that one. Although, I'm sure it will get better when Apple rebuilds it and  bundles on iPhone 5. So how would you do that on WP7? There have been some videos floating around showing Bing with some voice control so obviously the phone has speech recognition. So what options are there:

  • System.Speech? Not included in WP7/SL
  • Nuance software like Siri? No WP7/SL version yet.
  • Invoking the SAPI dlls on the phone? No automation factory in WP7 SL.
  • Web services using System.Speech and mic on the phone? YES!

The last one was my least favorite but that works for now.

I built a quick sample app to show how to do text-to-speech and speech recognition on WP7.

image 

@eklimczak will not be happy with the developer designed UI.

In this sample there is web service with provides access to the system.speech APIs in .NET. Basically its just passing around byte arrays. On the phone its using the XNA audio frameworks to play the text-to-speech stream and to record using the microphone. The code is pretty simple and you can download from the link at the end of this post. The only things to note are adjusting the WCF config to handle larger byte uploads and the Microphone API is a little weird with that 1 second buffer. It would be nice if you could just to mic.start and mic.end which would return an array of bytes instead of managing your own stream inside the buffer ready callback.

Couple of downsides to this approach:

  • Recoding from the phone has some static. Could be my code or the my mic is bad / not calibrated right.
  • Having to make web service calls instead of local access is not ideal (Microsoft, please add an API for the SAPI dlls) Although in the context of an app like Siri its not so bad since you need to do web service lookups to get data back
  • Speech recognition quality really depends on either a) a limited grammar set like that pizza grammar in the sample or b) training the recognizer. For the latter it would be annoying to have users train the system. Using the System.Speech stuff youd have to have a profile for each user.

So until Microsoft adds some speech client APIs on the phone or Nuance releases a wp7 product, this is a decent workaround. In the future Id like to build something similar to Siri. I shall call it Iris in homage. Im a big fan of mobile speech apps because frankly its just not safe to Google while driving.

Since some of my designer co-workers have been posting UI sketches for WP7, Id like to start posting some code prototypes for things I try out on the phone. That will probably last 2 weeks, but for the moment I have like 10 posts in the queue.

Sample Code 100% guaranteed to work on my emulator

Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.



Email this Article

© Dot net Slackers or respective owner