122

I am looking for some easy to install text to speech software for Ubuntu that sounds natural. I've installed Festival, Gespeaker, etc., but nothing sounds very natural. All very synthetic and hard to understand.

Any recommendations out there?

Jorge Castro
  • 71,754

16 Answers16

67

SVOX pico2wave

sudo apt install libttspico-utils

A very minimalistic TTS, a better sounding than espeak or mbrola (to my mind). Some information here.

I don't understand why pico2wave is, compared to espeak or mbrola, rarely discussed. It's small, but sounds really good (natural). Without modification you'll hear a natural sounding female voice.

AND ... compared to Mbrola, it recognise Units and speaks it the right way!
For example:

  • 2°C → two degrees
  • 2m → two meters
  • 2kg → two kilograms

After installation I use it in a script:

#!/bin/bash
pico2wave -w=/tmp/test.wav "$1"
aplay /tmp/test.wav
rm /tmp/test.wav

Then run it with the desired text:

<scriptname>.sh "hello world"

or read the contents of an entire file:

<scriptname>.sh "$(cat <filename>)"

That's all to have a lightweight, stable working TTS on Ubuntu.

jkoop
  • 59
  • 6
user85321
  • 1,415
  • 2
    As far as I can see, it only uses cli parameters as input. Is there any way I can get pico2wave to read text from a filename? – Carlos Eugenio Thompson Pinzón Feb 15 '14 at 17:42
  • 17
    pico2wave is in package libttspico-utils in recent versions of ubuntu. @CarlosEugenioThompsonPinzón cat <filename> | xargs -I foo -0 pico2wave -w blah.wav foo – naught101 Mar 11 '14 at 09:11
  • 3
    @CarlosEugenioThompsonPinzón pico2wave -w a.wav "$(input.txt)" =). Agree that this CLI interface is bad design: unlike the huge majority of CLIs, and possible to reach the OS max CLI arg length. – Ciro Santilli OurBigBook.com Apr 13 '14 at 09:44
  • i have installed it pls tell me how to use it in espeak i cant get – user49557 Jun 19 '15 at 10:36
  • @user49557: this answer isn't about espeak. for this answer, install the libttspico-utils package, then put pico2wave -w ~/output.wav "the text" – Koen Jun 22 '15 at 09:14
  • @CiroSantilli六四事件法轮功纳米比亚胡海峰 May that CLI interface thingee be the reason that my pico2wave output stucks after 858 words, while the .txt file I provided is 10600 words? – Koen Jun 22 '15 at 09:18
  • 1
    @Koen I don't know! :-) Like any other problem, try to produce a minimal example, e.g. using echo {1..1000} – Ciro Santilli OurBigBook.com Jun 22 '15 at 09:48
  • @koen thanks dude but can i ask you any more questions related to this – user49557 Jun 25 '15 at 10:51
  • 1
    @user49557 We're not supposed to hijack others' questions, so maybe you can create a new question, explaining what exactly you installed, and what it is that went wrong, and then I can always try and help you (no guarantees, though, I'm not an expert :P) – Koen Jun 25 '15 at 12:24
  • @user85321 I don't understand why pico2wave is, compared to espeak or mbrola, rarely discussed. Because with such a name it's really not easy to find but you are perfectly right, I tried it and compare to eSpeak, that's far much better! – 2ndGAB Jan 31 '18 at 12:33
  • @2ndGAB, because the syntax doesn't natively support files, and the fact that it should be used in two steps(as explained in the shell) and the command pico2wave itself doesn't support pipes, it could be nice if someone gives a workaround about the pipes. Otherwise the reading almost human and is not outperformed by other than Prestigio's text to speech. – user10089632 Feb 10 '18 at 16:42
  • On Ubuntu 18.04 this sounds very unnatural when I ask it to pronounce English. "Hello world" comes out as "hee-low valud" :( – Martin Eden Oct 19 '18 at 11:05
  • 1
    @MartinEden That's because in this answer, the line in the script which calls pico2wave includes the option -l=de-DE which causes it to use a German voice. Just remove that option and it will default to a English (US) voice. I will propose an edit to this answer as it does not make sense given that both the answer, and the sample text ("hello world") are in English and not German. – Jon Bentley Jan 20 '19 at 19:13
  • This is GREAT but I cannot figure out how we can replace the default ubuntu TTS espeak with this one? – Binod Kalathil Oct 01 '20 at 10:02
35

Pico and espeak are fun and easy to get to work, but they're not all that good. The default Festival voices are also not that good. However, Festival is a scheme-based speech framework, where a number of researchers have built much better plug-in voices. You can easily surpass the pico2wave quality on stock Ubuntu, because one of those voices is available as a ready-made package.

To make Festival sound natural, here's what to do:

sudo apt-get install festival
sudo apt-get install festvox-us-slt-hts
festival -i
festival> (voice_cmu_us_slt_arctic_hts) 
festival> (SayText "Don't hate me, I'm just doing my job!")

You can do it from the command line by using -b (or --batch) and putting each command into single quotes:

festival -b '(voice_cmu_us_slt_arctic_hts)' \
    '(SayText "The temperature is 22 degrees centigrade and there is a slight breeze from the west.")'

You can get other quite good voices from the Nitech repository, but installing them is finicky, and the default paths changed so the file name references in the bundled scheme files may need to be manually edited to work on stock Ubuntu.

Jon Watte
  • 508
  • 4
  • 7
  • 4
    Btw, in Ubuntu 16.04, this package seems to be missing. You can download and install the deb from Debian and it will work fine: https://packages.debian.org/sid/all/festvox-us-slt-hts/download sudo dpkg -i Downloads/festvox-us-slt-hts_0.2010.10.25-2_all.deb – Jon Watte Aug 20 '17 at 02:48
  • Much better than pico2wave, adjustable, better separation between words, better customzability, 1-step. With pico I had to slow the output wav in order to understand it – Berry Tsakala Aug 05 '21 at 18:01
  • @BerryTsakala I just took script above and stuck it in ~/.bashrc as a function instead. I called the function speak and I can just use speak "something to say" and it works perfectly. – WinEunuuchs2Unix Oct 01 '21 at 00:18
  • 1
    OP asked for natural sounding TTS. Festival is still quite robotic. – Nav Mar 27 '22 at 12:55
  • 1
    Small command to read content of clipboard from bash: echo "(SayText \"$(xclip -selection clipboard -o)\")" | festival '(voice_cmu_us_slt_arctic_hts)' --pipe – Olle Härstedt Nov 05 '22 at 23:41
  • Is there no way to get this to go to a wav file? It crashed because my server does not have audio out... – jjxtra Feb 09 '23 at 00:35
  • 1
    @jjxtra The manual page is at https://linux.die.net/man/1/festival and documents the command. You can run in --server mode. Or you can the festival command language to synthesize an utterance and save it to disk. See also https://www.cstr.ed.ac.uk/projects/festival/manual/festival_7.html – Jon Watte Feb 17 '23 at 17:04
  • 1
    An example on how to read a text file, and being able to pause it, would be nice, too. – Olle Härstedt Nov 03 '23 at 09:10
21

SpeakIt!

I believe Ive found the best TTS software for free using a Google Chrome extension called "SpeakIt". This only works in the Chrome browser for me on Ubuntu. It doesnt work with Chromium for some reason. SpeakIt comes with two female voices which both sound very realistic compared to everything else out there. There are at least four more male & female voices listed s Chrome extensions if you search the Chrome Web Store using "TTS" as your query.

Usage: For use on a website. you highlight the text you want to be read and either right click and "SpeakIt" or click the SpeakIt icon docked on the Chrome top bar.


Firefox users also have two options. Within Firefox addons, do a search for TTS and you should find "Click Speak" and also "Text to Voice". The voices are not as good as the Chrome SpeakIt voices, but are definitely usable.

The SpeakIt extension uses iSpeech technology and for a price of $20 a year, the site can convert text to MP3 audio files. You can input text, URLs, RSS feeds, as well as documents such as TXT, DOC, and PDF and output to MP3. You can make podcast, embed audio, etc. Here is a link, and a sample of their audio (don't know how long the link will last).

Pablo Bianchi
  • 15,657
  • 4
    Unfortunately none of the browser options work for PDF files. Have you come across one that does? I'd like to be able to select paragraphs to read from a PDF (i.e. not have to paste bits to terminal or other) – James Owers May 07 '16 at 18:05
  • 1
    this extension works for me on chromium 50.0.2661.94 using Debian 8.4 and its great! i especially like the english female voice. my only complaint is that it pauses for too long on commas. – mulllhausen Jun 28 '16 at 21:56
  • It often mispronounces words and also takes time to send the text to a separate server rather then just using your own system. – Goddard Mar 04 '17 at 06:25
  • Link is broken. – 842Mono Feb 28 '21 at 01:33
  • output is terrible compared to voicerss - very mechanical – Michael Nov 12 '21 at 17:10
14

Simple Google™ TTS

Update from project page (2016): This project is currently unmaintained and will remain so for the foreseeable future.


Because of the lack of a better alternative I wrote a bash script that interfaces with a perl script by Michal Fapso to provide TTS via Google Translate. From the project description:

The intention is to provide an easy to use interface to text-to-speech output via Google's speech synthesis system. A fallback option using pico2wave automatically provides TTS synthesis in case no Internet connection is found.

As it stands, the wrapper supports reading from standard input, plain text files and the X selection (highlighted text).

The main features are:

  • online TTS synthesis via Google translate
  • offline TTS synthesis via pico2wave
  • supports a variety of different languages
  • can read from CLI, text files and highlighted text
  • supports reading highlighted text with fixed formatting (e.g. PDF files)

Installation and usage are documented on the project page.

I'd be glad if you gave it a try. Bug reports and any other feedback are welcome!

Pablo Bianchi
  • 15,657
Glutanimate
  • 21,393
13

Piper

A fast, local neural text to speech system. Check site project for installation, download of a voice and usage. For e.g.:

echo 'Welcome to the world of speech synthesis!' | \
  ./piper --model blizzard_lessac-medium.onnx --output_file welcome.wav

gTTS, Google Text-to-Speech

gTTS, a Python library and CLI tool to interface with Google Translate's text-to-speech API. Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio manipulation, or stdout.

Cons: CLI-only. Need to be online as it requires requesting to Google public open endpoint.

sudo -H pip install gTTS  # Install

Usage

gtts-cli 'hello' --output hello.mp3
gtts-cli -l es 'Nadie es patria, todos lo somos' | play -t mp3 -

Documentation and more examples

Others

Some were already mentioned

Pablo Bianchi
  • 15,657
  • I found piper to be the best. I use this script for "speak selected text" feature: https://medium.com/@IanEdington/natural-sounding-speek-selected-text-for-linux-41025874c019 – IanEdington Mar 11 '24 at 02:11
13

I have looked high and low for text to speech for Ubuntu that is high quality. There is none. My vocal cords are paralyzed so I needed TTS to add voice instructions to my Ubuntu videos. You can get commercial high quality Linux text to speech software here. It's just really expensive. I ended up buying Natural Reader for Windows (doesn't work in Ubuntu under Wine) for $40. Maybe later I will get the Linux one.

Pablo Bianchi
  • 15,657
  • dude, there is and I was using it like last week there are at least 5 or 6 and I can't for the life of me find any of them now, gotta love our community – mchid Dec 21 '15 at 10:53
  • 1
    Textaloud has instructions to make their product work under wine. see http://nextup.com/forum/viewtopic.php?t=3349 I believe that cepstral has a linux port too. I have not been able to get my favorite software balabolka to work. I have windows 10 installed mostly for tts processing. MS David is good and similar to cepstral david. The prior one is free if you have windows 10. – Bhikkhu Subhuti Jun 19 '16 at 11:34
8

I have been conducting research on the best sounding and easily tuned text to speech voices. Below is a listing of what I thought were the top 5 products in order of sound quality. Most of the websites associated with these product have an interactive demo that will allow for you to make your own determination.

  1. NeoSpeech
  2. iVona
  3. Acapela
  4. AT&T Natural voices
  5. CereProc Voices
Jim
  • 81
  • 1
  • 1
6

Combine SVOX tools (pico) with LibreOffice:

SVOX (pico) tools are easy to install and brings good quality voices in Ubuntu. Install it:

sudo apt-get install libttspico0 libttspico-utils libttspico-data

You can use LibreOffice in combination with SVOX (pico) tools by install the "Read Text" extension and you obtain a "GUI" for this excellent TTS software:

Set up Read Text Extension's options with Tools - Add-ons - Read selection.... Use /usr/bin/python as the external program. Select a command line option that includes the token (PICO_READ_TEXT_PY), you may want to experiment some of them.

Now you only have to select some text in LO Writer, Calc, Impress or Draw and clic on the icon added as a tool bar (a happy face with a ballon).

leoperbo
  • 753
5

I find Nitech HTS voices on festival very natural and comforting over any other voices I have heard. See this link on how to set up Nitech and other sounds with festival. I have not found a good gui which I can use to configure those voices but setting them via festival.scm still works. That post is very old and you might want to find the actual installation directory using "locate festival" command

razor
  • 398
  • Seems to be very good. Found demos here http://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html – Iacchus Aug 21 '14 at 08:32
  • 3
    Yes, the Nitech voices are heads and shoulders above other Festival voices (except the CMU voices, which are also very good.) Too bad they're hard to install. There is one good CMU voice that has a default package in Ubunut, it's called cmu_us_slt_arctic_hts and comes in the package festvox-us-slt-hts. It is much better than pico or espeak! – Jon Watte Apr 25 '17 at 19:23
4

Here is what I did to have pure natural speech for pdf and other text files(other solutions are not natural or they're just paid services). This is actually a work around using chromium or chrome but works fast and easy.

  1. Install SpeakIt! extension on your chrome or chromium.
  2. Install PDF Viewer if you're using chromium(chrome already has a pdf viewer for free) and check 'Allow in incognito' and 'Allow access to file URLs' options in extensions settings of chromium.
  3. Drag and drop your pdf to browser.
  4. Now highlight some text and right click and select SpeakIt! so you can listen to pure natural text-to-speech.

There's also ways to open other files like .doc and .txt in chrome and do the same. There's other extensions for chrome that view pdf files, check if it fits you better. Besides you can upload all kind of texts in Google Drive and use SpeakIt! to read it for you. Another extension called 'Speak text' works the same way and has natural speech.

3

When searching for a better tts engine to use with the new firefox 49 narrative mode I found pico tts (svox) - my favorite TTS engine.

sudo apt install espeak libttspico0 libttspico-data libttspico-utils

How to change the default speech synthesis engine system wide?

People at arch linux brought me to the right path:

Uncomment the module you like and make it default in speech-dispatcher settings:

# sudo vim /etc/speech-dispatcher/speechd.conf

[...]
# -----OUTPUT MODULES CONFIGURATION-----
# Each AddModule line loads an output module.
#AddModule "espeak"       "sd_espeak"   "espeak.conf"
AddModule "pico-generic"  "sd_generic"   "pico-generic.conf"

[...]
#DefaultModule espeak
DefaultModule pico-generic

Restart the daemon:

# sudo systemctl restart speech-dispatcher.service

BUT, when starting firefox again, nothing happens. According to the link above (arch forum post #10 and #16) works with festival (did not try), but the speech-dispatcher for pico does not list available voices. It won't run.

Any idea out there would be highly appreciated ;-)

Pablo Bianchi
  • 15,657
apos
  • 529
1

Verbify-TTS

Yes! I encounter the exact same problem you are describing myself. One year ago I created a custom TTS I am using myself since almost two years now, and I open sourced it. It works offline and for free, using AI-based high-quality voice. You can you it everywhere: Firefox browser, PDF reader, chrome, LibreOffice, etc. It supports both Ubuntu and windows.

Feel free to have a look, I just created a video tutorial with installation steps and DEMO: https://youtu.be/hb1ZVwUcPCU

Download link and Project page: https://github.com/MattePalte/Verbify-TTS

Feel free to leave comment/open issue to discuss new ideas, problems or constructive criticism.

Hoping it will help you.

1

My favorite text-to-speech program is called Magic English, but like Natural Reader mentioned by Joe Steiger, it is a Windows program and I'm not sure if it will run under Wine.

AT&T Natural Voices is available online as a demo, but that's more of a work-around than a solution...

1

For that I build Intelligent Speaker - extension for Google Chrome. It can read pages even without selection (when text detention is correct).

1

Simple Google™ TTS

Update from project page (2016): This project is currently unmaintained and will remain so for the foreseeable future.


Pico, mbrola, cmu, festival, flite, all SUCK in 2017 (They were amazing in the 90s). AT&T natural speech (which is fantastic) isn't linux compat and it's not free, therefore we use Google

git clone https://github.com/Glutanimate/simple-google-tts.git
sudo apt install xsel libnotify-bin libttspico0 libttspico-utils libttspico-data libwww-perl libwww-mechanize-perl libhtml-tree-perl so$
cd simple-google-tts
sudo ln -s `pwd`/simple_google_tts /usr/local/bin
simple_google_tts en "Text to speech is now installed"
cd -
Pablo Bianchi
  • 15,657
Jonathan
  • 3,904
  • 3
    This is a duplicate of Glutanimate answer (the author of that project). Also: "Status update: This project is currently unmaintained and will remain so for the foreseeable future." He suggests some alternatives – Pablo Bianchi Feb 21 '19 at 17:59
  • This project is currently unmaintained and will remain so for the foreseeable future.

    This script and many others like it rely on an unofficial API that has recently become increasingly difficult to support. As Google continues to lock down access to their TTS interface I see no choice other than to suspend maintaining this script for the time being.

    – erwin Jul 19 '21 at 10:48
  • @PabloBianchi his answer did not have the install code in it – Jonathan Jul 26 '21 at 22:10
0

In Linux systems, you can dump X selection (the text you have selected on your screen with the mouse) to a text file, then read with some TTS (currently I use Google Translate Python script gTTS):

#!/bin/bash
TXT="/tmp/speak.txt"

save X text selection to a file

xclip -out > $TXT

remove smiles

sed -i 's/ :[pP]/./' $TXT sed -i 's/ ://./' $TXT sed -i 's/ :D/./' $TXT sed -i 's/ ;D/./' $TXT sed -i 's/ :(/./' $TXT

Abbreviations:

sed -i 's/[^a-z]IPv6[^a-z]/I P version 6/gi' $TXT sed -i 's/[^a-z]MR[^a-z]/merge request/gi' $TXT sed -i 's/[^a-z]btw[^a-z]/by the way/gi' $TXT sed -i 's/[^a-z]WIP[^a-z]/work in progress/gi' $TXT sed -i 's/[^a-z]CLI[^a-z]/command line/gi' $TXT

Latin

sed -i 's/i.e./that is/gi' $TXT sed -i 's/e.g./for example/gi' $TXT

gtts-cli -f $TXT | play -t mp3 -

Bind this script to some key, for example, right menu key, and every time you select some text in any program: Firefox, Thunderbird, LibreOffice Write, PDF reader, or even Terminal, you will hear the text.

PS. you can also add --slow option to gtts-cli.

Pablo Bianchi
  • 15,657