Technology
Advances in Voice Recognition
By Janine Lodato
A first-hand look at the magic of voice-recognition technology.
After fighting off a 25-year assault of multiple sclerosis (MS), my
hands can no longer type an entire document. In order to continue writing, I now rely on
voice-recognition technology to do my typing for me. Voice-recognition technology is still
in its infancy, and it has provided me with some amusing and frustrating moments. Finding
a voice recognizable to readers is tricky. Finding a voice recognizable to a computer is
even trickier.
I've always found typographical errors amusing, but my new software's
typos take the cake. The microphone into which I dictate sits right in front of my mouth,
jutting out from a headset with one earphone. The microphone is so sensitive that it
translates a heavy sigh into a, of, the, or what. A loud sneeze (achoo) from my husband in
a room nearby inspires the computer to type aha. But if I control my breathing, monitor
the whereabouts of my allergy-prone husband, and enunciate clearly, the computer usually
understands my words perfectly on only the second attempt. Its first interpretations are
nonetheless reminiscent of my childhood, when we so often said before dinner,
"Gracious Father, please bless this food for its intended uses." I understood
the prayer to say, "Gracious Father, please bless this food for its tender
juices." This is exactly the way my computer hears my voice--as a child would. Lots
of patient word training is needed to make the machine familiar with my vocabulary and
pronunciation.
Magically, the new technology knows there are multiple spellings for
some words--to, too, two--and gives me choices for spelling in the correction window off
to the side of the document. I need only pick the correct spelling and the technology
inserts it into the document for me.
Voice-recognition software has two modes of operation: dictate mode and
command mode. Dictate mode is the usual method people use when speaking into the
microphone. However, it is often necessary to access command mode to make changes in a
document, say, to capitalize a word or spell it. To access command mode, one must first
use a cue word. In the case of my software, I have programmed into the machine the word
"computer" to act as the cue that puts me into command mode. All I need to say
is, "computer select right (or left) one word," "computer capitalize
this," or "computer move right (or left) one word" and the software goes
into command mode, then right back into dictate mode. If I say, "computer begin
spell," I have accessed command mode for spelling and am ready to spell. When
finished, I say, "computer return" and the software automatically switches back
into dictate mode.
Lots of word training, trial-and-error, and patience are required when
working with voice-recognition software. Once I master it, it will be a real benefit to
me, but until I get to know it intimately, I can produce only simple documents. Having MS
no doubt motivates me to find a way of expressing myself in writing other than by typing.
As long as I am able to talk, voice-recognition technology offers me a mode of
communicating never before available to people in my position.
The technology I use is called IBM ViaVoice for Mac, and it has
increased my productivity tenfold. Thanks to this wonderful new technology, I can now
finish a book I've been writing using my word-processing software. My e-mail is greatly
improved, and I know I've only begun to tap into the wonders of voice recognition.
Half the world's population either has a disability or is helping people
with disabilities. Voice-recognition technology opens up the whole world to individuals
like me with disabilities. The more I get to know my new software, the less I rely on the
typing skills of my Hungarian husband. When I exercise ample patience, voice-recognition
technology combined with my husband's editorial input produces written documents I can be
proud of.
In spite of all the changes I must make in a document produced with
voice-recognition software and the fact that the technology is still in its infancy, I
find it magical, wonderful, and definitely worth the effort needed to learn and adjust to
it.
When computers have higher speeds and more main memory, voice
recognition will likely improve, as will the hardware and peripherals. At the moment, in
order for me to dictate text through voice recognition and manipulate it into word
processing, I need to wear two separate headsets: one for the voice-recognition microphone
and a second for a gyroscope used to control the mouse via head movements. This is bulky,
and I look forward to the day when I can wear just one headset and do all the work I need
to do hands-free. Perhaps a camera on top of my monitor would photograph my lip motions to
make voice recognition more accurate. Perhaps that same camera would photograph my eye
motions and blink rate to determine my alertness and productivity.
Fascinated with my new voice-recognition technology, I am compelled to
spend as much time as possible with it, learning as much about it as I can. In spite of my
MS, I am able to produce documents I can be proud of. I predict that before long everyone
in the computer industry will opt for voice recognition over keyboarding. It is the wave
of the future, well worth the added software cost, time, and effort required to learn it.
About the Author
Janine M. Lodato is a senior partner with Hi-Tech Inventions, San
Andreas, California. E-mail LaGiannina@aol.com.