2

I would like to know how I could measure the pronunciation of two words. These two words are quite similar and differ only in one vowel. I know there is, e.g., the Hamming distance or the Levenshtein distance but they measure the "general" difference between words. I'm also interested in that but mainly I would like to know how they sound differently. I think there must be something like this to test text-to-speech results?

Best would even be an online source where I could just type in those two words.

Ben
  • 205
  • 1
  • 7

1 Answers1

1

There are a handful of tools available for manually comparing pronounciations, though all are limited in some way. Depending on your usecase, you might be interested in:

  • Wikspeak: a tool that transcribes (single) words into IPA and generates a pronounciation. A web demo is available, though it’s a bit sensitive about browser versions.
  • espeak-ng: provides a CLI tool that does text-to-speech or text-to-IPA transcription
# use the —-ipa flag to display the inferred IPA transcription
espeak-ng -v en-US --ipa "horse”
# => hˈɔːɹs
espeak-ng -v en-US --ipa "hoarse"
# => hˈoːɹs

If you want a more automated solution, you could look into python libraries like eng-to-ipa to do IPA transcription (including disambiguation when a word can map to multiple IPA transcriptions). You could then try applying edit distance measurements to the IPA transcriptions to estimate the similarity of the pronounciations.

vlthr
  • 36
  • 2
  • 1
    Thanks a lot! I know I didn't target this in my question but do you know if the IPA is used in text-to-speech? Means, do machines try to translate by making use of the IPA or are they "just" trained von human speech and the like? – Ben Jul 13 '21 at 10:17
  • 1
    In regard to Wikspeak: This, and I've seen 1-2 others, cannot apply the IPA to fantasy names. I wonder why is this? The IPA should be valid to any word and, e.g., Coca Cola or Pepsi works though they are fantasy names. But another fantasy word is unknown. Are they all are just using a lookup table? I thought the IPA is a kind of a system which should be applicable to "any" word? – Ben Jul 13 '21 at 10:20
  • 1
    @Ben About the use of IPA in text-to-speech engines: I’m not sure, but understanding is that most traditional text-to-speech engines use some smaller language-specific set of phonemes internally (the simplest engines being a lookup table from word to phonemes, then stringing together pre-recorded voice clips for each phoneme). Nowadays it may be possible to train end-to-end on `text -> audio`, but I imagine it’s usually easier to model the problem as `text -> phoneme_sequence -> audio`. (e.g. [Mozilla’s TTS](https://github.com/mozilla/TTS/)) – vlthr Jul 14 '21 at 09:30
  • 1
    Mozilla’s TTS seems to use the [phonemizer](https://pypi.org/project/phonemizer/) package for pre-processing text into a phoneme sequence that is given to a deep learning models as input, but at a glance it seems like it optionally allows passing a config flag to disable phoneme preprocessing. (see [this](https://github.com/mozilla/TTS/blob/e9e07844b77a43fb0864354791fb4cf72ffded11/TTS/tts/utils/synthesis.py#L13) text-to-model-input preprocessor function, [text util module](https://github.com/mozilla/TTS/blob/e9e07844b77a43fb0864354791fb4cf72ffded11/TTS/tts/utils/text/__init__.py)) – vlthr Jul 14 '21 at 09:36
  • 1
    @Ben About Wikspeak and others not being able to transcribe fantasy names, you’re correct that they’re just using a lookup table (usually the CMU Pronouncing Dictionary for english). The challenge is that the pronounciation isn’t fully specified by the latin alphabet. In principle though it should be possible to guess a plausible IPA transcription for a fantasy name - [here](https://colab.research.google.com/github/as-ideas/DeepPhonemizer/blob/main/dp/notebooks/Inference_Example.ipynb)’s a deep learning model you could try. – vlthr Jul 14 '21 at 09:54
  • 1
    Thanks a lot again! How has "The challenge is that the pronounciation isn’t fully specified by the latin alphabet" to be understood? For almost the entire vocabulary there is a pronunciation (covered by the pronunciation alphabet), or not? – Ben Jul 14 '21 at 10:23
  • 1
    For dictionary words there is indeed an easy “specification” - someone has catalogued all the sounds that native speakers tend to make for a given latin alphabet word and simply published that as a lookup table. However, humans are so good at learning pronunciation that it’s easy to forget how much we deviate from the “ideal” pronunciations of each latin character. E.g. `the` can be "thuh" /ðə/ or "thee" /ði/ ` based on context, `refuse` can be /ˈɹɛfjuːs/ or /ɹɪˈfjuːz/ based on meaning (_garbage_/_refusal_), and the words `tough`, `cough`, `dough` and `bough` all have different `ough` sounds. – vlthr Jul 15 '21 at 06:18
  • 1
    That’s why an lookup table created from empirical observation of native speakers in a given region is the only fully reliable way to map from a word (plus extra information like tense) to its pronounciation(s). When humans encounter unfamiliar words, we infer plausible sounds based not just on the letters but on intuitions about word evolution and even history, like “does the word look like a latin or french import?” (leading to weirdness like [hyperforeignisms](https://en.wikipedia.org/wiki/Hyperforeignism)). That’s why fantasy names are hard - how would fantasy Englishmen say Worcestershire? – vlthr Jul 15 '21 at 06:32
  • 1
    The last one is a good one :) Thanks again! – Ben Jul 15 '21 at 06:39