Uncertain Pronunciations and Coding:
A Critique and ProposalUncertain Pronunciations and Coding: A Critique and Proposal


A problem for Web designers tackling accessibility is with pronunciation ambiguities. This can add a volume of work to every page and it is daunting. Should we spend time coding for accessibility or creating content? Our time is limited and we probably can’t do both without sacrificing one or the other or cutting back on something else.

Pronunciation ambiguities can result from differences in meaning or etymology for the same spelling, such as when one spelling supports two tenses. The ambiguities can also result from dialectal customs, where some people consistently pronounce a given word one way in all contexts while other people have another pronunciation of that word in all contexts.

Consider, for a semantic difference, the word that’s spelled “r e a d”. It can be pronounced /reed/ or /red/. It’s unlikely that text-to-speech (TTS) programs can tell the difference by context, so we have to write code specifically for TTS software that won’t interfere with how sightful users will experience the website. Even if TTS software can sometimes parse context to determine a pronunciation, it probably can’t always, or even most of the time. Consider the word “a”; do we pronounce it to rhyme with “say” or with “uh”? Which way is not always based on a rule but sometimes on the speaker’s custom (and dialects and idiolects give more examples). That means we have to find every instance of “a”, of “read”, and of every other word that has a pronunciation ambiguity and write code for every instance, not just every word in that collection but every time each one of those words appears. Talk about a burden. I try to remember to do it for “r e a d” and probably miss some. I never remember to do it for “a”. That comes up too often.

And other strings make the problem bigger. Suppose you offer the URL www.example.com. One TTS program read the “www” in a URL as “world wide web”, but that is not what the listener should type into a browser address bar and we didn’t tell them to. It should say “dot” for “.” in a URL but not say that at the end of a sentence unless the dot ends a URL that is also at the end of a sentence, but maybe it should say “period” if it’s discussing punctuation. If you surround a URL with angle brackets, which is a convention, should angle brackets be silent? Certainly not always. And a closing angle bracket is a closing angle bracket in one context but a right-pointing arrowhead in another context or a greater-than symbol in yet another even if there's no space to its right and part of ‘not equal to’ in still another context.

Brand names often have unexpected spellings, or, occasionally, expected spellings with unexpected pronunciations; new or obscure product brands won’t be known to TTS programs and neither will unexpected pronunciations. Plain English writing has misspellings; should a misspelling be rendered faithfully to the original error or corrected only when pronounced but not for sightful readers? Sentences usually begin with capital letters and that helps TTS programs determine how to sound out the sentence, but if a sentence begins by naming a computer command that has to be in all lower case, will the TTS program still recognize that it starts a sentence, thus giving the proper sound? What about command arguments that are words that begin with hyphens, either one or two?

For differences between people, dictionaries often give two or sometimes three pronunciations for a given word with no difference stated for when one pronunciation is to be preferred. A dictionary may indicate that they are heard about equally often or that one is much more common than the other but that the less-common one is not an error. A website author might prefer to use the dominant pronunciations or might have their own preferences for some words. If primary dictionaries disagree with each other, probably because the compilers had diifferent datasets, a website author might have a preference as to which dictionary to accept as authoritative for the purpose.

Add to that that you don’t know what TTS program a user is using, or what version of it, and that recognized vocabularies are probably proprietary, so you probably can’t get them from TTS companies, or look them up on the Web.

Some of this can be solved by rewriting the original text so everyone gets the same results, but if you’re quoting then rewriting may not be an option and, even without quoting, that method can producing writing that’s bad for other reasons.

So, you have to code defensively, in case a string is not in a TTS user’s recognized set. Coding defensively takes up even more time.

If you’re willing to do that and decide to code the contexts, then you have to choose just how much context to include. Too much and you have to remember to edit your pro-TTS support coding whenever you edit more of the page. Too little and two contexts might be encompassed by the one line of code when they should have different renderings, causing a TTS error.

Proposal

This state of affairs is not helpful. We can fix this.

First, we need a compilation of strings and formats that most modern TTS programs usually recognize, even if the pronunciations are ambiguous. Even if the lists are not released by TTS providers, they can be created by users using test files. Perhaps a website owner can offer to compile a unified list from the separate TTS program providers, such as if at least three providers have the same string in their lists, so providers can still maintain some competitive advantage while content editors can save on the work they have to do and TTS providers can discover what strings should be added to their programs.

Second, we need a dictionary of strings that are ambiguous in their pronunciations. An open dictionary would make coding easier than if we have to await validation and discover a ditionary’s contents that way. The dictionary should also concisely state, often through definitions, when one pronunciaiton is preferred and when another is.

Third, we need a tool, like a validator, where we can input either text or a URL and have it tell us what is missing from a TTS list and have it tell us which items are usually subject to ambiguity when rendered unless we coded them. It could recognize the coding when present, so we can focus on those items needing coding. Ordinary dictionaries of English, at least some at the upper end of dictionary scholarship, already often offer multiple pronunciations for one word. If default pronunciations are known, and are consistent across TTS programs, it can tell us what they are, so we can decide when coding is not necessary for a particular item. In that case, validation won’t be able to give a passing grade, but we’ll know from the results what to do and what to skip to save time.