More Precise Text-to-Speech in Your Website:
A HowtoMore Precise Text-to-Speech in Your Website: A Howto


Problem

How to pronounce a word is not always obvious unless you have its context and a rule. But, even in uncertain cases, once you know how you want it said in a certain context, you can provide that rule without altering what sightful visitors see. You can write whatever you like for all users and provide accurate rendering both for sightful and for blind or visually impaired visitors.

Text-to-speech (TTS) programs convert visual text on a screen the visitors cannot see into audible speech they can hear. But English and likely most popular natural languages have many common words that have multiple pronunciations each. TTS programs are generally not smart enough to understand from a context how a word should be pronounced.

Bad ideas include letting TTS do whatever it does or modifying what can be seen with normal sight. But leaving TTS to make its own mistakes can lead to severe misunderstandings, especially for short text, such as when past and present tenses get mixed up without alerting the listener. And modifying what can be seen by people who read text visually, which is what one famous scientist with a medical condition does, can create misunderstandings among people who have 20/20 vision and read visually. SSML is another bad idea, in my opinion, because writing it (Speech Synthesis Markup Language) requires a parallel set of pages, bad enough for a site that doesn’t change but worse if you do even minor maintenance from time to time, and you’ll need to maintain expertise in SSML or refresh yourself at times. I have enough work to do. I don’t do SSML parallel pages.

Solution

Instead, where there’s ambiguity, I go off-page and supply the preferred pronunciation for the whole context. It’s work and I probably miss most strings that need it, but the workload is a separate issue. Fortunately, at least I have a method that provides a support structure.

I apply the Pronunciation Lexicon Specification (PLS) (<http://www.w3.org/TR/pronunciation-lexicon/>). Using PLS, I create a *.pls file, which I like to name pronunciation.pls and which I place in the root level of a website of mine. The pronunciation.pls file gives pronunciation rules that are particular to that website. Keep in mind that the context has to be unique not only for the page but for the whole website and there’s no system for referring to the specific page in the *.pls file, so what you should do is copy more of the context so that the context will be unique across the whole website.

I write the file in a text editor.

You can make it more readable by adding blank lines and comments.

You can see the one for this site at http://BrittleBit.org/pronunciation.pls.

Then I link the *.html files with the *.pls file, and I do that in every *.html file, where, in the head element, I add this link element:

 

<link rel="pronunciation" href="pronunciation.pls" media="all" />

 

Each website can have its own rules, which makes sense since usually each website has unique text even for sightful readers.

Complex Sites

For large websites or where you control only part of a website, or where a *.pls file is getting too unwieldy, a probable solution may be to do this system for each directory. The HTML link would link to the *.pls file in a specific directory and the *.pls file would be placed in accordance with that link. That should work; but I haven’t tested it. Since websites are often hosted (served to the public) from a server where many sites share space and each site has its own directories (folders), using PLS on a per-directory basis would almost certainly succeed and might make your work easier.

Bottom Line

Even with the site-wide system, now I have a place to add site-specific rules wherever I need clarity.