Uncertain Pronunciations and Coding:
A Critique and ProposalUncertain Pronunciations and Coding: A Critique and Proposal


A problem for Web designers tackling accessibility is with pronunciation ambiguities. This can add a volume of work to every page and it is daunting. Should we spend time coding for accessibility or creating content? Our time is limited and we probably can’t do both without sacrificing one or the other or cutting back on something else.

Consider the word that’s spelled “r e a d”. It can be pronounced /reed/ or /red/. It’s unlikely that text-to-speech (TTS) programs can tell the difference by context, so we have to write code specifically for TTS software that won’t interfere with how sightful users will experience the website. Even if TTS software can sometimes parse context to determine a pronunciation, it probably can’t always, or even most of the time. Consider the word “a”; do we pronounce it to rhyme with “say” or with “uh”? Which way is not always based on a rule but sometimes on the speaker’s custom (and dialects and idiolects give more examples). That means we have to find every instance of “a”, of “read”, and of every other word that has a pronunciation ambiguity and write code for every instance, not just every word in that collection but every time each one of those words appears. Talk about a burden.

And other strings make the problem bigger. Suppose you offer the URL www.example.com. One TTS program read the “www” in a URL as “world wide web”, but that is not what the listener should type into a browser address bar and we didn’t tell them to. It should say “dot” for “.” in a URL but not say that at the end of a sentence unless the dot ends a URL that is also at the end of a sentence, but maybe it should say “period” if it’s discussing punctuation. If you surround a URL with angle brackets, which is a convention, should angle brackets be silent? Certainly not always. And an opening angle bracket is an opening angle bracket in one context but a left-pointing arrowhead in another context or a less-than symbol in yet another even if there's no space to its right.

Brand names often have unexpected spellings; new or obscure product brands won’t be known to TTS programs. Plain English writing has misspellings; should a misspelling be rendered faithfully to the original error or corrected only when pronounced but not for sightful readers? Sentences usually begin with capital letters and that helps TTS programs determine how to sound out the sentence, but if a sentence begins by naming a computer command that has to be in all lower case, will the TTS program still recognize that it starts a sentence, thus giving the proper sound? What about command arguments that are words that begin with hyphens, either one or two?

Add to that that you don’t know what TTS program a user is using, or what version of it, and that recognized vocabularies are probably proprietary, so you probably can’t get them from TTS companies, or look them up on the Web.

Some of this can be solved by rewriting the original text so everyone gets the same results, but if you’re quoting then rewriting may not be an option and, even without quoting, that method can producing writing that’s bad for other reasons.

So, you have to code defensively, in case a string is not in a TTS user’s recognized set. Coding defensively takes up even more time.

If you’re willing to do that and decide to code the contexts, then you have to choose just how much context to include. Too much and you have to remember to edit your pro-TTS support coding whenever you edit more of the page. Too little and two contexts might be encompassed by the one line of code when they should have different renderings, causing a TTS error.

Proposal

This state of affairs is not helpful. We can fix this.

First, we need a compilation of strings and formats that most modern TTS programs usually recognize, even if the pronunciations are ambiguous. Even if the lists are not released by TTS providers, they can be created by users using test files. Perhaps a website owner can offer to compile a unified list from the separate TTS program providers, such as if at least three providers have the same string in their lists, so providers can still maintain some competitive advantage while content editors can save on the work they have to do and TTS providers can discover what strings should be added to their programs.

Second, we need a tool, like a validator, where we can input either text or a URL and have it tell us what is missing from a TTS list and have it tell us which items are usually subject to ambiguity when rendered unless we coded them. It could recognize the coding when present, so we can focus on those items needing coding. Ordinary dictionaries of English, at least some at the upper end of dictionary scholarship, already often offer multiple pronunciations for one word. If default pronunciations are known, and are consistent across TTS programs, it can tell us what they are, so we can decide when coding is not necessary for a particular item. In that case, validation won’t be able to give a passing grade, but we’ll know from the results what to do and what to skip to save time.