Synthetic Voice Shock Reverberates Across the Divides!

July 30, 2008

Synthetic Voice Shock — oh, those awful voices!

As I communicate with other persons with progressive vision loss, I often sense a quite negative reaction to synthetic, or so-called ‘robotic’, voices that enable reading digital materials and interfacing with computers. Indeed, that’s how I felt a few years ago. Let’s call this reaction "synthetic voice shock" as in:

  • I cannot understand that voice!!!
  • The voice is so inhuman, inexpressive, robotic, unpleasant!
  • How could I possibly benefit from using anything that hard to listen to?
  • If that’s how the blind read, I am definitely not ready to take that step.

Conversely, those long experienced with screen readers and reading appliances may be surprised at these adverse reactions to the text-to-speech technology they listen to many hours a day. They know the clear benefits of such voices, rarely experience difficult understandability, exploit voice regularity and adjustability, and innovate better ways of "living big" in the sighted world, to quote the LevelStar motto.

The ‘Synthetic Speech’ divide

Synthetic voice reactions appear to criss-cross many so-called divides: digital, generational, disability, and developer. The free WebAnywhere is the latest example with a robotic voice that must be overcome in order to gain the possible benefits of its wide dissemination. Other examples are talking ATM centers and accessible audio for voting machines. The NVDA installation and default voice can repel even sighted individuals who could benefit from a free screen reader as a web page accessibility checker or a way to learn about the audio assistive mode. Bookshare illustrates book reading potential by a robotic, rather than natural, voice. Developers of these tools seen the synthetic voice as a means to gain the benefits of their tools while users not accustomed to speech-enabled hardware and software run the other way at the unfriendliness and additional stress of learning an auditory rather than visual sensory practice.

This is especially unfortunate when people losing vision may turn to magnifiers that can only improve spot reading, when extra hours and energy are spent twiddling fonts then working line by line through displayed text, when mobile devices are not explored, when pleasures of book reading and quality of information from news are reduced.

Addressing Synthetic Voice Shock

I would like to turn this posting into messages directed at developers, Vision Losers, caretakers, and rehab personnel.

To Vision Losers who could benefit sooner or later

Please be patient and separate voice quality from reading opportunities when you evaluate potential assistive technology.

The robotic voice you encounter with screen readers is used because it is fast and flexible and widely accepted by the blind community. But there do exist better natural voices that can be used for reading books, news, and much more. While these voices seem initially offensive, synthetic voices are actually one of the great wonders of technology by opening the audio world to the blind and gradually becoming common in telephony and help desks.

As one with Myopic Macular Degeneration forced to break away from visual dependency and embrace audio information, I testify it takes a little patience and self-training and then you hear past these voices and your brain naturally absorbs the underlying content. Of course, desperation from print disability is a great motivator! Once overcoming the resistance to synthetic voices, a whole new world of spoken content becomes available using innovative devices sold primarily to younger generations of educated blind persons. Freed of the struggle to read and write using defective eyesight, there is enormous power to absorb an unbelievable amount of high quality materials. As a technologist myself, I made this passage quickly and really enjoyed the learning challenge, which has made me into an evangelist for the audio world of assistive technology.

If you have low vision training available, ask about learning to listen through synthetic speech. For the rest of our networked lives, synthetic voices may be as important as eccentric viewing and using contrast to manage objects.

So, when you encounter one of these voices, maybe think of them as another rite of passage to remain fully engaged with the world. Also, please consider how we can help others with partial sight. With innovations from web anywhere and free screen readers, like NVDA, there could be many more low cost speaking devices available world wide.

To Those developing reading tools with Text-to-Speech


Do not expect that all users of your technology will be converts from within the visually impaired communities familiar with TTS. Provide a voice tuned in pitch and speed and simplicity for starters to achieve the necessary intelligibility and sufficient pleasantness. Suggest that better voices are also available and show how to achieve their use.

It’s tough to spent development effort on such a mundane matter as the voice, but technology adoption lessons show that it only takes a small bit of discouragement to ruin a user’s experience and send a tool they could really use straight into their recycle bin. Demos and warnings could be added to specifically address Synthetic Voice Shock and show off the awesome benefits to be gained. The choice of a freely available voice is a perfectly rational design decision but may indicate a lack of sensitivity to the needs of those newly losing vision forced to learn not only the mechanics of a tool but also how to lis en to this foreign speech.

To Sighted persons helping Vision Losers

You should be tech savvy enough to separate out the voice interface from the core of the tool you might be evaluating for a family member or demonstration. Remember the recipient of the installed software will be facing both synthetic voice shock and possibly dependency on the tool as well as long learning curve. Somehow, you need to make the argument that the voice is a help not a hindrance. Of course, you need to be able to understand the voice yourself, perhaps translate its idiosyncrasies, and tune its pitch and speed. A synthetic voice is a killer software parameter.

You may need to seek out better speech options, even outlay a few bucks to upgrade to premium voices or a low cost tool. Amortizing $100 for voice interface over the lifetime hours of listening to valuable materials, maintaining an independent life style, and expanding communication makes voices such a great bargain.

And, who knows, many of the voice-enabled apps may help your own time shifting, multi-tasking, mobile life styles.

To Rehab Trainers

From the meager amount of rehab available to me, the issue of Synthetic Voice Shock is not addressed at all. Eccentric viewing, the principles of contrast for managing objects, a host of useful independent living gadgets, font choices, etc. are traditional modules in standard rehab programs. Perhaps it would be good to have a simple lesson listening to pleasant natural voices combined with more rough menu readers just to show it can be done. Listening to synthetic voices should not be treated like torture but rather like a rite of passage to gain the benefits brought by assistive technology vendors and already widely accepted in the visually impaired communities. Indeed, inability to conquer Synthetic Voice Shock might be considered a disability in itself.

As I have personally experienced, it must be especially difficult to handle Vision Losers with constantly changing eyesight and a mixed bag of residual abilities. It could be very difficult to tell Vision Losers they might fare better reading like a totally blind person. But when it comes to computer technology, that step into the audio world can both reduce stress of struggling to see poorly in a world geared toward hyperactive visually oriented youngsters, especially when print disability opens the flow of quality reading materials, often ahead of the technology curve for sighted people.

The most useful training I can imagine is a session reading an article from AARP or sports Illustrated or New York times editorial copied into a version of TextAloud, or similar application, with premium voices. Close those eyes and just relax and listen and imagine doing that anywhere, in any bodily position, with a daily routine of desirable reading materials. To demonstrate the screen reader aspect, the much maligned Microsoft sam in Narrator can quickly show how menus, windows, and file lists can be traversed by reading and key strokes. The takeaway of such a session should be that there are other, perhaps eventually better, ways of reading print materials and interacting with computers than struggling with deteriorating vision, assuming hearing is sufficient.

So, let us pay attention to Voice Shock

In summary, more attention should be paid to the pattern of adverse reactions of Vision Losers unfamiliar with the benefits of the synthetic speech interaction that enables so many assistive tools and interfaces.

Hyperlinks considered Harmful! On to structured Reading.

July 6, 2008

Our changing modes of reading

This post visits topics heavy on web technology, with troubles well beyond vision loss. The previous blog post describes my current reading regime with print disability and technology adaptations. I find common ground with an article in the summer 2008 Atlantic Monthly and assorted blog commentaries bemoaning information overload and discomfort induced by chronic web use. I draw on some related resources from my audio channels of interviews and reviews. The central question is how our plastic brains are reprogrammed by our reading technologies, emphasizing the stresses and joys we find operating in a tug-of-war over what controls our reading lives.

Why is it hard to read a whole article?

The July-august Atlantic Monthly features an article that asks "Does Google Make Us stupid?" . This title suggests an excursion into declining abilities of critical analysis. Rather, the discussion is the gnawing sense that the structure of interactive media combined with pressures to assimilate lots of online information is actually changing not only reading habits but also brain structure. I found this thesis fascinating from my own experience of deliberately rebuilding my reading life and knowing my brain was re-wiring itself for auditory rather than visual input of words and written thoughts. This is pretty profound stuff.

Ugh, the article’s title itself is kind of stupid, a touch by an editor rather than the article’s author. Indeed, Google is described as a monument to measurement technology in attempting to achieve the best all-around responsiveness to user queries, up to trying to read minds as represented by query histories. That’s a worthy game and has changed the world but is not the crux of the article. The key idea is that a hyperlink from a web page you are reading is not only a reference but a propellant toward action, as Carr describes its effect. In the context of technology that encourages multitasking, impulsiveness, and need to be interlocked with others on myriad networks, hyperlinks could be considered harmful. Note: my hyperlink references are at the bottom of this post.

The analytic tradition of ‘XX considered harmful!’

The phrase ‘XX considered harmful’ is a tradition in computer science, canonized by the late E. W. Dijkstra in a 1968 article where XX was ‘goto’, a programming construct. He argued that the goto statement in languages like the then dominant FORTRAN caused unnecessary errors and difficulties in reasoning about programs. Somebody tracing through the flow of code would encounter a goto then need to branch their thinking into the continuation of line-by-line code flow as well as taking up where the goto said to go. The problem was also at the other end, when reading code, you had little way of knowing what other code might jump there under unknown conditions. This generated a decade of articles and result that showed both theoretically and practically, very few occasions required a literal goto, that more attention to the algorithm led to code better organized using loops, cases, and exceptions. For example, a well designed loop could be replaced by a logic description of the changes made, no matter how the iteration was accomplished. After the ruckus died down, there were improvements in languages, practices, and pedagogy called the age of Structured Programming.

<h3?Wherre is the harm in using hyperlinks?

My question here is whether the complaints against the goto and the hyperlink are a useful analogy. Suppose I put a link here to the Atlantic Monthly online website. You might be tempted to stop reading my article right here in order to get to the original context. That’s perfectly legitimate, but will you return to my thought stream or continue branching from the magazine article? or start a whole new thread of interest? Can you hold all the branching structure of your day’s reading in your brain and browser history? This is a cognitive dilemma for both reader and writer, stemming from a simple html element. Our scholastic training to cite sources and to help the reader use hypertext technology to reach the source in an instant causes some grief for all of us.

Carr and others are saying that hyperlink-driven reading is making it more difficult for them to read longer articles in printed or online form and even reducing their ability to read books. Is this a genuine loss of some cognitive ability? or is it just a change in reading habits? In either case, is the effect reversible? As some blog comments suggest, maybe there are other reasons for the expressed discomfort, like burn-out, aging, or natural shifts of interest.

Where did hypertext fall apart?

This discussion hit home for me for several reasons. I was a student of hypertext theory in previous career incarnations in the 1980s. Questions then were about types of links, e.g. clarifying, refining, challenging,… To cite one major example, Robert E. Horn elaborated numerous models of hyperlink for different kinds of documentation and uses. Design theorist Horst Rittel evolved the concept of issue-based information systems to address ‘wicked problems’, characterizing difficult social problems requiring intense collaborative analysis. This truly was the golden age of structured Hypertext before the WWW came along and offered goto style hyperlinks to everybody.

For my new reading style using a screen reader, hyperlinks are more often annoyances, as advertising, navigation’s, privacy notices, and 100s of links I never plan to click but must traverse or avoid in order to get to the content of a web page. This means hyperlinks consume personal energy, which may be a partial cause of current reading discomfort. Every inline hyperlink is a decision point – go there? do that now? or later? abandon this article? If we made all these decisions consciously, we would feel even more the personal energy drain. I have learned how loss of visual acuity forces more attention toward energy management to accomplish most reading tasks and to overcome inevitable errors.

Since I went through a period of several months of painful reading, I have a tremendous appreciation for the reading technology I can now use effectively, as discussed in my article on ‘tools, Materials, and strategies for Non-visual reading’. I really did almost lose it, not from attention but from sensory change. I still marvel that my brain can interpret the sounds coming from a synthetic voice and absorb the content as fully as I used to visually, or at least I think so. Wow, a synthetic voice is just a data file and algorithm, but what a difference these make to the print-disabled world!

Does audio reading make hyperlinks less harmful?

As I rebuilt my reading skills, I have come to visualize my reading content as mostly a tree of subjects and articles, retrieved primarily by RSS, and represented in text and mp3 files. If I count in a half dozen daily newspapers retrieved by a pipeline of blind services, every day yields easily over 1000 articles, cached or retrieved by wireless. Reading this way, maybe 50 articles a day, is a very well controlled process because the temptation to take a hyperlink is very rare. In other words, my RSS client and News stand control me while I control my web browser. Although my ICON PDA supports hyperlink activation, my decisions are simpler without a browser. Do I read this article or not, based on title and context in the tree? do I read politics blogs now, later, or skip for a while? Which topics are sufficiently intriguing to switch into browsing mode for searching and exploration? When the tree gets disorganized or its retrieval profile changes, how do I reorganize the branches? all this helps reduce context switching and clicking through regions of inactivity. My non-visual reading regime seems to be much more structured than formerly, more focused on textual content than on links and relationships.

Yet, when my Icon Mobile Manager required a 2 week trip for repairs, I rather welcomed the respite from those 1000s of articles. I had to get my news the old-fashioned way, by airwaves on TV or radio, or by visiting websites. I was amazed at how much work I had to put in to set up the feeds and patterns I had evolved over a year with my Icon assistive technology. Upon return home of the Icon, I trimmed out a few feeds that seemed redundant or left over from previous interests, but mainly I place more time limits on my article reading. It also helps to have the Democratic party race out of the way.

Rregarding books, I do tend to skip around much more than in the past. Because I have a rich library of book files to choose from, I am evolving new interests and Reading patterns. I don’t need to feel bad about not finishing a book as it can still reside on my memory card in an out of the way folder. As to concentration, most of my reading is insomniac style or on the road or for book clubs. Hey, maybe that’s what carr and others need is a social book club with a list of questions for reading and discussion — Do guys do that?

Is there ‘structured reading’?

Ok, I am starting to ramble here. I have suggested the analogy between ‘goto considered harmful’ and ‘hyperlink considered harmful’. My reading program with controlled separation of RSS delivered material from freestyle web browsing could be dubbed ‘Partially Structured Reading’.I share, indeed I just know, that my brain has adapted to the forced changes of print-disabled reading styles by evolving its own techniques for decision-making, context-switching, and stack management. In my view hyperlinks cause two forms of harm. First, they encourage divergence without the convergence and summarizing techniques that enabled overcoming the analogous ill effects of the goto statement. Second, the current hyperlink HTML element that simultaneously expands and binds the web is a primitive instrument that cannot be used for serious thought without imposing some of the rigor of early hypertext theories, e.g. the purpose of the link.

Some more observations on reading as a cognitive activity

I’d like to bring up a few more references on this topic from my audio channels and personal experience:.

  • Former Microsoft executive Linda Stone has laid out our syndrome of ‘continuous, partial attention’ in a fascinating podcast. She asks the fundamental question: do you really want to live that way?

  • A book on ‘distraction’, as interviewed by the wise Diane Rehm on WAMU, details a reform program for teaching attention skills in k-12 to enable a transition from pure information greed to appreciation of facts and policies, e.g. those faced in health care and basic civics.

  • Another book on my wish list, mentioned in the Atlantic Monthly article, is ‘Proust and the squid’ by Maryanne wolf. As interviewed on Brain science, points out that reading is not natural but rather highly contextual in culture and the current technology, whether stone tablets or networks. Scientifically, a lot is going on to show how the brain is truly plastic, evolved to rewire for different styles of processing information.

  • The ultimate brain deconstruction exercise is that of neuroscientist Jill B. Taylor who witnessed the dissolution of her cognitive and physical abilities during a left brain stroke. She then used her right brain sensitivity to guide her rehabilitation, taking this further to remolding her personality. A wild-ass theory I conceived from her description of the limbic system, the so-called reptilian brain, is that perhaps hyperlinks trigger a fight or flight response that might underlie the discomfort of web surfing – every hyperlink suggests a danger or defensive curiosity, lurking at the end of link. The good news she suggests is that these autonomic responses only lack 90 seconds, after which the more rational or familiar emotional thinking is in control. She reminds us that humans might consider themselves as thinking beins with feelings but rather we are primarily feeling processors which think some times.

  • My monthly book club chose ‘The Uncommon Reader’ by British playwright Alan Bennett. This novella traces the Queen’s life style changes from a chance encounter with a mobile reading van, through selections and borrowings of an increased number and variety of reading materials under the tutelage of a Human resource (servant) Norman and the interventions of MBA style queen handler sir Kevin. As the Queen becomes more intrigued with common lives, her relationships with her Duties and supporters changes, discomfiting many whom she interrogates about their reading preferences. Eventually her reading turns into extended reflection expressed in writing and, upsetting everything, a full blown urge to compose a book. While humorous, the novella asks many more serious questions. How does anybody gain or lose in total life experiences from their reading patterns? what does it mean to one’s colleagues to have an active reading program, and also be open about it? To oneself, what are my selection criteria for books, characters, plots? Is reading books an optional life activity or an ingrained part of one’s personality and character? would this royal opsimath enjoy wikipedia and Google?

what these studies lack, I suggest, is investigation into the non-visual ways of working, based in visual memories, alternative styles of work, and so-called assistive tools.

