Highest Voted 'speech-recognition' Questions - Artificial Intelligence Stack Exchange

7

votes

2 answers

Term for algorithms that are not trained

Before the advent of neural architectures, many AI domains (e.g. speech recognition and computer vision) used algorithms that consisted of a series of hand-crafted transformations for feature extraction. In speech recognition everything to do with…

asked Mar 22 '23 at 11:09

Mew

181
2

4

votes

1 answer

How do AIs like Siri and Alexa respond to their names being called?

AIs like Siri and Alexa respond to their names being called. How does the system recognize the name by ignoring all the other words that have been said before their name? For example, "Hey Siri" would trigger Siri to start listening for commands,…

applications speech-recognition siri

asked Jul 26 '20 at 04:39

Sarem Hailemeskel

43
3

4

votes

1 answer

How does the CTC loss work?

I am trying to implement CTC loss in TensorFlow, but their documentation is pretty limited. So I am not sure how to approach the problem. I found a good example in Theano. Are any other resources that explain the CTC loss? I am also trying to…

reference-request tensorflow recurrent-neural-networks speech-recognition ctc-loss

asked Jul 01 '19 at 14:50

user26787

41
2

3

votes

1 answer

Can transformer be better than RNN for online speech recognition?

Does transformer have the potential to replace RNN end-to-end models for speech recognition for online speech recognition? This mainly depends on accuracy/latency and deploy cost, not training cost. Can transformer support low latency online use…

recurrent-neural-networks transformer speech-recognition

asked Mar 08 '20 at 03:48

jw_

199
1
5

3

votes

0 answers

Speaker Identification / Recognition for less size audio files

I am working on speaker identification problem using GMM (Gaussian Mixture Model). I have to just identify one user present in the given audio, so for second class noise or silent audio may use or not just like in image classification for an object…

generative-model data-science speech-recognition state-of-the-art

asked Jan 04 '20 at 13:40

Posi2

358
2
16

2

votes

2 answers

What is a beam?

For example, faster-whisper's transcribe function takes an argument beam_size: Beam size to use for decoding. What does "beam" mean?

terminology speech-recognition inference

asked Aug 05 '23 at 18:28

Geremia

163
6

2

votes

2 answers

Open-source vocal cloning (speech-to-speech neural style transfer)

I want to program and train a voice cloner, in part to learn about this area of AI, and in part to use as a prototype of audio for testing and getting feedback from early adopters before recording in a studio with voice actors. For the prototype, I…

neural-networks tensorflow speech-recognition speech-synthesis style-transfer

asked Mar 02 '23 at 16:52

miguelmorin

101
5

2

votes

3 answers

Has there been research done regarding processing speech then building a "speaker profile" based off the processed speech?

Has there been research done regarding processing speech then building a "speaker profile" based off the processed speech? Things like matching the voice with a speaker profile and matching speech patterns and wordage for the speaker profile would…

natural-language-processing reference-request speech-recognition voice-recognition

asked Nov 04 '16 at 15:20

Tory

175
6

2

votes

0 answers

How do I train a multiple-speaker model (speech synthesis) based on Tacotron 2 and espnet?

I'm new to Speech Synthesis & Deep Learning. Recently, I got a task as described below: I have problem in training a multi-speaker model which should be created by Tacotron2. And I was told I can get some ideas from espnet, which is a end-to-end…

deep-learning recurrent-neural-networks audio-processing speech-recognition speech-synthesis

asked Feb 06 '20 at 04:13

Envelo Lee

21
1

2

votes

1 answer

How to use AI for language recognition?

Given an audio track, I'm trying to find a way to recognize the audio language. Only within a small set (e.g. English vs Spanish). Is there a simple solution to detect the language in a speech?

machine-learning natural-language-processing audio-processing speech-recognition

asked Jan 02 '20 at 16:08

Tina J

973
6
13

2

votes

1 answer

What is the difference between Kaldi and DeepSpeech speech recognition systems in their approach?

I would like to know how do Kaldi and DeepSpeech speech recognition systems differ algorithmically? Which one would be more accurate for continuous speech in time?

machine-learning convolutional-neural-networks long-short-term-memory deep-neural-networks speech-recognition

asked Nov 25 '19 at 06:18

Hanu

31
1
3

2

votes

0 answers

Is there a detailed description or implementation of an end-to-end speech recognition system?

I am currently trying to implement an end-to-end speech recognition system from scratch, that is, without using any of the existing frameworks (like TensorFlow, Keras, etc.). I am building my own library, where I am trying to do a polynomial…

neural-networks natural-language-processing research reference-request speech-recognition

asked Nov 09 '19 at 18:51

Jaswin

121
4

1

vote

0 answers

What is the number of channels of input audio mel spectrogram?

What is the number of channels of input audio mel spectrogram? For example, in CV we always have 3 input channels on RGB picture. But what about audio?

neural-networks convolutional-neural-networks computer-vision speech-recognition

asked Apr 15 '23 at 22:24

randomuser228

11
1

1

vote

0 answers

How to align or synchronize Youtube caption with audio accurately

I need to use the automatic caption from Youtube to precisely isolate excerpts from the video aligned to text and generate the dataset to train a model in French. So I've already written the script, but when I compare the audio with the matching…

deep-learning speech-recognition

asked Dec 27 '21 at 07:00

Cara Duf

11
3

1

vote

0 answers

Looking for help on initializing continuous HMM model for word level ASR

I have been studying HMM implementation approaches on ASR for the last couple of weeks. This probabilistic model is very new to me. I am currently using a Python package called Pomegranate to implement an ASR model of my own for the Librispeech…

speech-recognition markov-chain hidden-markov-model gaussian-mixture-models

asked Apr 30 '21 at 18:37

Zander

11
1

Questions tagged [speech-recognition]