Convert speech (mp3 audio files) to text

Question

I am looking for simple converter from mp3 to txt. I have tried, without success: julius, CMU Sphinx, ... In the past 4 hours I did not find a way how to use them (or properly install them).

What I am looking for is something like:

$ converterapp -infile myspeech.mp3 -outfile myspeech.txt

I am also fine with GUI application since I only have a few files to convert and can click around.

Edit: With the help of this answer Speech-recognition app to convert MP3 to text? I manged to get it working but it produces no output. Well, actually it produces a couple of blank lines (no words detected)...

$ pocketsphinx_continuous -infile 1.wav -hmm en-us/cmusphinx-en-us-5.2 -lm en-us/en-70k-0.2.lm -logfn /dev/null &>otput.txt is the exact command as per @NikolayShmyrev question. I have downloaded the models from https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/ . — Samo, Oct 17 '16 at 12:09

score 11 · Answer 1 · answered Apr 30 '18 at 18:27

pocketsphinx will do speech to text from an existing audio file. Depending on the initial format of the mp3, you may need two separate commands.

First convert your existing audio file to the mandatory input format:

    ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav

The run pocketsphinx

    pocketsphinx_continuous -infile file.wav 2> pocketsphinx.log > myspeech.txt

the created file myspeech.txt will have what you're looking for.

In case you are new to ubuntu, you would need to install the above programs using this command:

    sudo apt install pocketsphinx pocketsphinx-en-us ffmpeg

Funny how pocketsphinx-en-us isn't a dependency or even not having it installed doesn't generate an error when executing. — Adam, Jun 23 '20 at 20:15

MayeulC · Answer 2 · 2024-03-27T15:15:34.583

5

OpenAI's Whisper (link to press release) is a relatively new free and open-source alternative, with pretty good performance in multiple languages.

There are a few ways to install it, you can do so via pip, python's package manager: pip install -U openai-whisper

$ whisper audio.mp3 --model medium

A comment below points out that the use of a python "virtual environment" may be suggested. This is a way for python's pip to install software in a subdirectory, therefore not impacting the rest of your system:

$ # Creates a new environment called "newenv" (also creates a subfolder with the same name)
$ python -m venv newenv
$ # Activate the new environment by sourcing the bin/activate script from the new folder
$ source ./newenv/bin/activate
(newenv)$ # pip will now install modules in the venv, and python will use modules from there
(newenv)$ pip install -U openai-whisper
(newenv)$ whisper audio.mp3 --model medium
(newenv)$ deactivate  # exit the venv (once you are done)
$

edited Mar 27 '24 at 15:15

answered May 03 '23 at 08:48

MayeulC

376
4
5

This sounds like a great answer, but I think it can be improved. Typing pip install -U openai-whisper gives some error message about this being an external environment and requiring a virtual environment, blah blah, lots of things that are incomprehensible unless you're an experienced Python developer. – k314159 Mar 26 '24 at 15:30
1

@k314159 It didn't display this for me last time I tried it, but I added a short tutorial to my answer instead of leaving that in my previous comment (which I am deleting). – MayeulC Mar 27 '24 at 15:16
1

Thanks, that looks good. Another way to install whisper, which I tried successfully yesterday, is to use pipx. – k314159 Mar 27 '24 at 15:25

score 1 · Answer 3 · answered Jan 05 '20 at 13:34

Mozilla SpeechDeep opensource speech-to-text tool will do. You will need to install the application on your linux desktop. Or you can try Transcribear a browser based speech-to-text tool that does not require installation, but you will need to be connected online to upload the recording to the server.

Convert speech (mp3 audio files) to text

3 Answers3