Speech Recognition in the “New iPad”
Jon W. Wahrenberger, MD
Once the realm of science fiction, the last decade has seen the application of speech recognition technology in a wide range of situations. This speech-to-text technology has not only assisted the ordinary person in sending emails and word processing activities, but has been a huge productivity enhancer for the documentation needs of novelists, physicians, attorneys and other professionals. For the individual with physical limitations or person with some types of dyslexias, this technology has truly been a communication life-safer, providing not only text creation functionality, but also computer command and control capabilities. While speech recognition technology has been seen in mobile computing devices, this has largely been limited to stand-alone applications that are not integrated into the application where they might be most needed: an email application, a word processing document and the text entry box on a web page.
Enter now the “new iPad”
With the 3rd generation iPad, Apple has taken the long needed plunge by providing background speech recognition in a process it calls “keyboard dictation”. The capability is present almost anywhere the virtual keyboard is present and is initiated simply by touching the small microphone icon on the keyboard and speaking (see image below).
Although Apple isn’t saying much beyond the fact that the process involves speech being “sent to Apple”, it appears that the technology is a cloud- based process much like that employed by a variety of applications made by Nuance Communications, Inc., including Dragon Dictation and Dragon search. The idea is that your speech is captured, compressed, and sent to Apple where is it processed, converted to text, and then sent back — and all in the time it takes to blink an eye. It is my very strong suspicion, in fact, that Apple is using Nuance or Dragon-based speech recognition. But more power to them for picking the best – Nuance is the clear leader in this technology.
How well does it work? In a word – amazingly! It is highly accurate, fast, and almost ubiquitous on the iPad. I have tried it in emails, notes, word processing documents, web page URL entry fields and it works perfectly in all of these contexts.
Using iPad Keyboard Dictation
What do you need to know if order to make 3rd generation iPad speech recognition (keyboard dictation) work for you? Here are some suggestions:
- Activating it: If you aren’t seeing the microphone icon on the keyboard, you may need to turn it on. Go to Settings > General > Keyboard > Dictation and turn it on.
- Using it: Keyboard dictation is available almost everywhere the keyboard is available and an internet connection is present. In the rare place that it’s not available, you’ll see the keyboard but not the microphone icon. To use it, simply touch the microphone icon. You’ll see a voice recognition icon show up (see below).Simply talk (aiming your voice toward the microphone on the top of the iPad). When you’re done with the dictation, touch the voice icon to end the capture. Within seconds your text will appear.
- Speak your punctuation: Remember that the program simply transcribes the words you speak and neither understands the content nor grammar checks you work. It is necessary to say all punctuation, such as “period”, “comma”, “new line”, “new paragraph”, etc. See the table below for a compendium of common punctuation and commands which are recognized by the iPad’s keyboard dictation.
- Keep in mind that your dictation time is not infinite. In my experience, dictation stops after just shy of 40 seconds of recording. So you need to do your dictation in 30 second or so chunks – no big deal. As soon as text has been inserted into your document, you can click on the keyboard microphone icon again and dictate more.
- WiFi vs. 3G: We’ve tried it both ways. The bottom line is that it works with both. If WiFi is available it will probably be utilized and will be quicker, but if you have a good 3G or LTE signal you should be fine as well. We have not calculated the data usage in transporting you text to the cloud, so be mindful of the potential to consume data when using cell connection.
- Optimizing it: As accurate as it can be, keep in mind that speech recognition software doesn’t understand content and the quality of the end-result is highly dependent upon a clean signal and clearly spoken words. Here are a few measures that will improve your accuracy:
- Enunciate distinctly (don’t mumble or slur your words)
- Speak in phrases or complete sentences as much as possible (it helps to think ahead before you talk)
- Minimize contaminating external noise (TV, Radio, screaming babies, etc.). The on-board microphone is not directional and will pick up virtually any noise from any direction and this will merge with your spoken words and degrade the purity of your speech input.
- If you plan to dictate regularly in an environment with significant extraneous noise, strongly consider using a noise canceling headset microphone (see below).
- Speak closely to the microphone (the strength of a sound signal falls rapidly with distance)
- Correct errors when they occur. Words of low certainty will have a doted line underscore – if you hover over these words you will be given a choice of alternative selections from which to choose. As an alternative, manually change any errors. If the Apple speech recognition is truly based on the Nuance product, such changes are tracked and incorporated into your speech model, so similar errors will be less likely to occur in the future.
- Special situations: If your situation or needs are extraordinary or if you truly need high levels of accuracy, you should consider the following:
- Headset microphones: A good quality headset microphone will provide improved accuracy and immunity from external noise compared with the on-board microphone. To better understand the differences in external noise rejection between the iPad on-board microphone and a variety of headset microphones see the “hearing is believing” section on our website, where you can hear actual recordings. Unfortunately you cannot simply insert the microphone plug of a headset microphone into the iPad audio jack. Doing so simply shorts out the microphone terminals in the iPad and on-board microphone will remain the active microphone. Using a headset microphone required the use of an adapter such as the one pictured below (Speech Recognition Solutions iPad Headset Adapter), which separates the mic-in and stereo sound-out functions of the iPad audio jack.
Some microphones which we have specifically tested with the iPad 3 using the adapter shown above and which provide excellent results, are listed below. To learn more, please visit my page on Speech Recognition and the iPad.
- Andrea ANC 700 and 750
- Andrea NC 181, 181VM, 185 & 185VM
- Audio Technica Pro 8HEmW*
- Cyber Acoustics AC 101 and AC 201
- Radio Shack “Sennheiser ME3 Knockoff”*
- Sennheiser ME3*
- UmeVoice theBoom series* (“O”, “C”, & V4)
*Microphones marked with the asterisk are more expensive but also associated with the most aggressive external noise rejection
The advantage of a headset microphone plugged into the audio jack on the iPad is that it can be utilized with all audio functions on the device, including keyboard dictation, Skype and other internet telephony, and audio recording software such as GarageBand. This is not the case for a Bluetooth microphone or microphone attached via the 30-pin dock connector.
- Bluetooth microphones: If you already have a Bluetooth microphone, this will work with speech recognition on the iPad, but keep in mind that if the boom doesn’t extend most of the way to your mouth, the quality of the signal going into your iPad is not likely to be much different that using the on-board mic. A Bluetooth mic with an extended boom is a much better choice. Two Bluetooth microphones which we have tested and work well with speech recognition in the 3rd generation iPad are the UmeVoice theBoom “W” and the VXI Xpressway. Keep in mind that while Bluetooth microphones will work with keyboard dictation on the 3rd generation iPad, they will not necessarily be accepted as the default microphone for all audio functions on the iPad. Bluetooth microphones with full size booms that we have specifically tested and found to work with keyboard dictation on the iPad are the following:
- UmeVoice theBoom “W”
- VXI TalkPro Xpressway
- USB Microphones: While it is possible to utilize a USB microphone with the iPhone and iPad when using the appropriate adapter included in the “Camera Connection Kit”, when any USB device is so attached to the 3rd generation iPad using the dock connector, the iPad will no longer show the “virtual keyboard” when in a text entry environment so a USB microphone cannot be used with iPad keyboard dictation.
- Microphones attached in other ways using the 30-pin dock connector: As is the case with USB adapters, while certain audio connectors made for use with Apple mobile devices may allow you the option of providing audio input via the dock connector, such input will not be utilized by keyboard dictation in the 3rd generation iPad.
If the iPad wasn’t already the most revolutionary device to hit the market in the last decade, the addition of speech recognition has truly sealed its place in this category. With an amazing degree of speed and accuracy you can convert your spoken words into text in almost any text entry window. The world is not just at your fingertips, but now at the tip of your tongue. Congratulations, Apple, on this great addition to the iPad.
For More information:
- Using a Microphone with the iPad (link to White Paper)
- iPad User Manual from Apple
- Speech Recognition Solutions iPad Accessories Page
- Nuance Mobile Solutions site
Jon Wahrenberger, MD