Synthetic Voice Shock Reverberates Across the Divides!

July 30, 2008

Synthetic Voice Shock — oh, those awful voices!

As I communicate with other persons with progressive vision loss, I often sense a quite negative reaction to synthetic, or so-called ‘robotic’, voices that enable reading digital materials and interfacing with computers. Indeed, that’s how I felt a few years ago. Let’s call this reaction "synthetic voice shock" as in:

  • I cannot understand that voice!!!
  • The voice is so inhuman, inexpressive, robotic, unpleasant!
  • How could I possibly benefit from using anything that hard to listen to?
  • If that’s how the blind read, I am definitely not ready to take that step.

Conversely, those long experienced with screen readers and reading appliances may be surprised at these adverse reactions to the text-to-speech technology they listen to many hours a day. They know the clear benefits of such voices, rarely experience difficult understandability, exploit voice regularity and adjustability, and innovate better ways of "living big" in the sighted world, to quote the LevelStar motto.

The ‘Synthetic Speech’ divide

Synthetic voice reactions appear to criss-cross many so-called divides: digital, generational, disability, and developer. The free WebAnywhere is the latest example with a robotic voice that must be overcome in order to gain the possible benefits of its wide dissemination. Other examples are talking ATM centers and accessible audio for voting machines. The NVDA installation and default voice can repel even sighted individuals who could benefit from a free screen reader as a web page accessibility checker or a way to learn about the audio assistive mode. Bookshare illustrates book reading potential by a robotic, rather than natural, voice. Developers of these tools seen the synthetic voice as a means to gain the benefits of their tools while users not accustomed to speech-enabled hardware and software run the other way at the unfriendliness and additional stress of learning an auditory rather than visual sensory practice.

This is especially unfortunate when people losing vision may turn to magnifiers that can only improve spot reading, when extra hours and energy are spent twiddling fonts then working line by line through displayed text, when mobile devices are not explored, when pleasures of book reading and quality of information from news are reduced.

Addressing Synthetic Voice Shock

I would like to turn this posting into messages directed at developers, Vision Losers, caretakers, and rehab personnel.

To Vision Losers who could benefit sooner or later

Please be patient and separate voice quality from reading opportunities when you evaluate potential assistive technology.

The robotic voice you encounter with screen readers is used because it is fast and flexible and widely accepted by the blind community. But there do exist better natural voices that can be used for reading books, news, and much more. While these voices seem initially offensive, synthetic voices are actually one of the great wonders of technology by opening the audio world to the blind and gradually becoming common in telephony and help desks.

As one with Myopic Macular Degeneration forced to break away from visual dependency and embrace audio information, I testify it takes a little patience and self-training and then you hear past these voices and your brain naturally absorbs the underlying content. Of course, desperation from print disability is a great motivator! Once overcoming the resistance to synthetic voices, a whole new world of spoken content becomes available using innovative devices sold primarily to younger generations of educated blind persons. Freed of the struggle to read and write using defective eyesight, there is enormous power to absorb an unbelievable amount of high quality materials. As a technologist myself, I made this passage quickly and really enjoyed the learning challenge, which has made me into an evangelist for the audio world of assistive technology.

If you have low vision training available, ask about learning to listen through synthetic speech. For the rest of our networked lives, synthetic voices may be as important as eccentric viewing and using contrast to manage objects.

So, when you encounter one of these voices, maybe think of them as another rite of passage to remain fully engaged with the world. Also, please consider how we can help others with partial sight. With innovations from web anywhere and free screen readers, like NVDA, there could be many more low cost speaking devices available world wide.

To Those developing reading tools with Text-to-Speech


Do not expect that all users of your technology will be converts from within the visually impaired communities familiar with TTS. Provide a voice tuned in pitch and speed and simplicity for starters to achieve the necessary intelligibility and sufficient pleasantness. Suggest that better voices are also available and show how to achieve their use.

It’s tough to spent development effort on such a mundane matter as the voice, but technology adoption lessons show that it only takes a small bit of discouragement to ruin a user’s experience and send a tool they could really use straight into their recycle bin. Demos and warnings could be added to specifically address Synthetic Voice Shock and show off the awesome benefits to be gained. The choice of a freely available voice is a perfectly rational design decision but may indicate a lack of sensitivity to the needs of those newly losing vision forced to learn not only the mechanics of a tool but also how to lis en to this foreign speech.

To Sighted persons helping Vision Losers

You should be tech savvy enough to separate out the voice interface from the core of the tool you might be evaluating for a family member or demonstration. Remember the recipient of the installed software will be facing both synthetic voice shock and possibly dependency on the tool as well as long learning curve. Somehow, you need to make the argument that the voice is a help not a hindrance. Of course, you need to be able to understand the voice yourself, perhaps translate its idiosyncrasies, and tune its pitch and speed. A synthetic voice is a killer software parameter.

You may need to seek out better speech options, even outlay a few bucks to upgrade to premium voices or a low cost tool. Amortizing $100 for voice interface over the lifetime hours of listening to valuable materials, maintaining an independent life style, and expanding communication makes voices such a great bargain.

And, who knows, many of the voice-enabled apps may help your own time shifting, multi-tasking, mobile life styles.

To Rehab Trainers

From the meager amount of rehab available to me, the issue of Synthetic Voice Shock is not addressed at all. Eccentric viewing, the principles of contrast for managing objects, a host of useful independent living gadgets, font choices, etc. are traditional modules in standard rehab programs. Perhaps it would be good to have a simple lesson listening to pleasant natural voices combined with more rough menu readers just to show it can be done. Listening to synthetic voices should not be treated like torture but rather like a rite of passage to gain the benefits brought by assistive technology vendors and already widely accepted in the visually impaired communities. Indeed, inability to conquer Synthetic Voice Shock might be considered a disability in itself.

As I have personally experienced, it must be especially difficult to handle Vision Losers with constantly changing eyesight and a mixed bag of residual abilities. It could be very difficult to tell Vision Losers they might fare better reading like a totally blind person. But when it comes to computer technology, that step into the audio world can both reduce stress of struggling to see poorly in a world geared toward hyperactive visually oriented youngsters, especially when print disability opens the flow of quality reading materials, often ahead of the technology curve for sighted people.

The most useful training I can imagine is a session reading an article from AARP or sports Illustrated or New York times editorial copied into a version of TextAloud, or similar application, with premium voices. Close those eyes and just relax and listen and imagine doing that anywhere, in any bodily position, with a daily routine of desirable reading materials. To demonstrate the screen reader aspect, the much maligned Microsoft sam in Narrator can quickly show how menus, windows, and file lists can be traversed by reading and key strokes. The takeaway of such a session should be that there are other, perhaps eventually better, ways of reading print materials and interacting with computers than struggling with deteriorating vision, assuming hearing is sufficient.

So, let us pay attention to Voice Shock

In summary, more attention should be paid to the pattern of adverse reactions of Vision Losers unfamiliar with the benefits of the synthetic speech interaction that enables so many assistive tools and interfaces.

Look, ma, no screens!! nvda, non-Visual Desktop Access, is my new Reader.

September 22, 2007

Summary: This Vision Loser makes the transition to screen reader dependence, sets up her new tablet notebook with mostly open source apps, and learns many painful new routines.

As my vision changed over the past year, I started to use Narrator, the minimalist screen reader built into Windows XP speaking in Microsoft Sam. I had seen and heard demos of the standard Freedom Scientific JAWS and GW Micro WindowEyes and also tried the newcomer System Access to Go but could not bring myself to invest the $$ fees and upgrade slippery slope and irreversible learning time. However, something deeper, perhaps my Rebel archetype, said “don’t go with the traditional, but find your own pathway.” After all, I’m not on the “rehab grid”, I pay my own way, I appreciate and understand software, and I have time to experiment.

A short flirtation with the Thunder screen reader supported many of my needs, but was rather, well, quirky. A podcast on ACB Replay and review from Blind Geek Zone introduced the nvda (non visual desktop access) open source, free screen reader from young Michael Current, a blind Australian, and his budding infrastructure nvAccess . A simple install, the quick start on the screen, an easy switch to my own synthetic voices, and a bout of fumbling with the keyboard and I knew this was, for me, “the real thing”.

As luck would have it, my Dell notebook’s screen dissolved and I needed to move my primary connectivity and screen to backup Toshiba tablet now also getting a bit old and precarious. With a new tablet moving into the household, along with the Linux-based Icon PDA and it was time to totally remodel my computing environment and my brains, hands, mouse, and reflex “operating system”.

Any relocation, whether household or computer, is a time of mental and emotional turmoil. What applications should I move, e.g. the text reader discussed earlier, and the voice data files I’ve grown accustomed to? Where are the license keys, the setups’ or links to later versions? Maybe it’s also time to revamp my myriad email accounts now mostly funneled through gmail, which I love-hate? Do I want to commit my new setup to the “stove pipe of evil” — Microsoft office, Internet Explorer, Outlook Express? A month later, I’m trying to distill in this post my painful experiences, with more to come later on gmail and portable apps and recent announcements from Mozilla and IBM.

First, let’s define a “screen reader” as really a “screen listener” which responds to events from the Windows operating system and running applications as the user moves focus around the screen. Usually the OS and applications express themselves with dialog boxes and wait for user requests on menus and buttons. The screen listener picks up information about these events and speaks them through a speech engine and chosen synthetic voice files. This is really complicated because there are so many levels of operating systems and applications software, mechanical and electronic hardware in keyboards and mouse, and users flittering around the screen looking for something with their finger or finger surrogates twitching movements leading to a rapid stream of events to be mediated by the screen listener, vying with other processes for memory resources, preferably without crashing.

Narrator is actually understated in value, as Microsoft software goes. Upon initiation, a dialog warns that you’ll probably want a more robust screen reader for everyday use, but well, here’s Narrator for backup or to get you started. Indeed, one purpose of Narrator is to try to assist Windows installation. If you are unfamiliar with Narrator, go to the Start button and type Run and then Narrator or find and work through the Accessibility Wizard. Narrator will occasionally choke when Windows is in a precarious state, but can usually be counted on to walk through the primary windows on the screen and through the file explorer. Therefore, here’s my

Fundamental rule of survival:

(***) Keep Narrator as a backup and remember how to use it with different types of outage: eyesight, mouse, keyboard, resources. It’s there on the desktop as a shortcut in my 911Emergency folder, on the Windows start menus (added in the users + You + startup directory, and specifically added in the startup directory. Of course, you have to find it first and create a shortcut to copy around. And there’s the Start button + Run + Narrator.

Setting up nvda:

nvda is available from ….with either an installer or a zip extractor version. The installer may be hard to understand voice-wise and may be overkill. nvda has a very important property of being a Portable App that keeps all its files in a single directory that will run from wherever it’s extracted, including a USB memory stick. Portability means that you can walk up to modern Windows systems, plug in the memory stick, start nvda from an autorun or shortcut, and you’re in screen listening mode, albeit maybe not with your accustomed voices.

nvda has a number of Preferences to set up or leave as defaults: speech engine, voice and its speed, how much to read punctuation, and rules of behavior in a browser (called “virtual buffer”).

Each screen reader package has a “modifier” key to be keyed in conjunction with letters and other keys. nvda uses the Insert (INS), which may be found in widely varying places on keyboards: immediately right of space on Toshiba, upper right corner on Motion Computing tablet plastic cover keyboard, and middle right of backspace on my Bluetooth 101 full sized keyboard. One of the hassles, a dread for me, is memorizing the needed keys for the screen reader and my customary applications. It’s boring, never-ending, and I just needed to get over An audio tour on the nvAccess website prodded me to continue trying, even to “RTFM”.

Here’s my memory bank to illustrate a few:

Windows shortcuts: ALT+TAB among windows, ALT+F4 to exist an app, ESC to get out of most dialogs, space or enter to push a button, TAB to move around in a window, right and left to open and close tree views with up and down inside a tree,

Trainer Karen McCall of Karlen Communications in Canada calls this knowledge “literacy” but it is often not learned until needed and then becomes essential. with nvda (or any other screen reader), a user must develop a rhythm of interaction, receiving and interpreting speech feedback, e.g. where a TAB has taken you, within or among applications.

nvda frequent actions in Mozilla Firefox include: “h” to headings, k to links, up down between lines, top to reload, combining with Firefox shortcuts control+F to quick find a phrase, control+k to open a search, control+L to type in a location, control+TAB to move among tabs, control+T to create a new tab. And now the big switcheroo in a screen reader is to notify it you’re in an edit box and don’t want the k and other nvda operations, invoked by Insert+Space, known as “virtual buffer passthrough on or off”, always to be remembered on forms.

Well, to wrap up this post, I highly recommend nvda for partially sighted users. It works unbelievably well, especially considering the price ($0) and ease of setup and portability. It lacks the scripting and maturity of the big $1000 packages but has a corps of open source developers helping out, i.e. nvda has a rapid trajectory of development and improvement. As a developer myself, nvda is inspirational, showing how much one dedicated technical person can accomplish in a remarkably short span of time.

My prejudice toward open source throws some light on my above semi-facetious comment about the “stove pipe of evil”. “Stove pipe” refers to communities that don’t talk to each other very much and only use software within their pipe or area. I’m not implying Microsoft evil empire here but rather that lock-in is a user choice that I do not want for myself. Too often I’ve received email which consists of a paragraph written as a MS WORD which I need to click to launch a big application to read, which assumes I own MS WORD or have its reader working, when a simple text body of a message would be safer (clicking an attachment asks for trouble, like a virus), lighter, and easier to produce. Outlook is OK but too attached to WORD. Internet Explorer has finally provided the tabbed windows available for years in Mozilla Firefox, and is a fine browser, but not attractive to me after Firefox. Where I’m let down now in the open sources space is OpenOffice which is inaccessible with nvda. Mostly, my Rebel says to go follow the path of most freedom and change if it offers the affordability and functionality I need.

More to come on “Portable Apps, a good trend, and ones that work for me”, “Living in the new operating system of Web 2.0 and browsers”, and “untangling and reading gmail”.

