I am using Microsoft Visual Basic 2008 express edition. In order to include speech recognition into your project you have to include references to the Microsoft Speech Object Library (COM) and System.Speech (.Net). Since my computer is running Windows XP I had to download Microsoft’s Software Development Kit 5.1(SDK 5.1) from Microsoft’s website. Links to download the required software are provided below. All the software I downloaded was free.
Microsoft Software Development Kit 5.1
Microsoft Visual Basic 2008 Express Edition
After two weeks of researching speech recognition I implemented a speech recognition algorithm in software. Using Microsoft’s Speech Application Programming Interface (SAPI) a basic algorithm was developed.
Wait for speech detection. Move to next state when a word is spoken
Move to next state when word is hypothetically found
Figure 9: Speech Algorithm
When the program begins, the program waits for speech detection. When a word is spoken into the microphone, the word is broken into phonemes and then goes through Statistical Modeling to try and find the word spoken. An example of modeling used in software is called the Markov Model. An example of how a computer would interpret the word tomato is shown below. The word tomato is broken up into several phoneme’s (T, ow, m, aa, t, ow). If you follow the phoneme through the model below you will see that tomato can be pronounced two different ways but at the last branch of the model there is a 90% chance that the word is tomato. That is the word that is then selected. This type of modeling enables a much quicker type of speech recognition when compared to a brute forced method.
Figure 10: Markov Models
There is one significant drawback to this model. Since this method try’s to match the word spoken into the microphone to several words which may sound the same in the English language the results are not optimal and often times return the wrong word. For this reason I must look at creating my own language library instead of using Microsoft’s library.
One way to improve the speech recognition on a system is to take the speech recognition training on your laptop. The following training is available in your control panel in the speech icon. The more training you take the more accurate your speech recognition becomes.
Figure 11: Speech Training
Speech Demo to Class
One possible approach to speech recognition was presented to the class. This demonstrated the background to speech recognition and how you can use voice commands to call functions. The demonstration also showed how difficult speech recognition is. To improve speech recognition we ordered a headset and implemented a custom grammar library using XML. We need to shoot for 95% accuracy of speech recognition.
To improve the speech recognition interface a custom grammar library was created instead of using Microsoft’s grammar library. My initial thinking is that this would improve the program significantly because it would limit the amount of words the engine had to find. I had to learn XML syntax to create my own grammar library. After reviewing examples on XML I was able to successfully implement my own grammar library. Microsoft’s SDK5.1 comes with a grammar library compiler test environment. Here you can write your grammar libraries, compile and run them. If your voice commands are recognized here then you can conclude that the voice commands will be recognized in your program. An example of the grammar compiler is shown below.
Figure 12: XML Grammar Library