Highest Voted 'voice-recognition' Questions - Artificial Intelligence Stack Exchange

28

votes

8 answers

Is there any research on the development of attacks against artificial intelligence systems?

Is there any research on the development of attacks against artificial intelligence systems? For example, is there a way to generate a letter "A", which every human being in this world can recognize but, if it is shown to the state-of-the-art…

asked Oct 09 '19 at 17:45

Lion Lai

423
4
9

5

votes

2 answers

Can variations in microphones used in training set and test set impact the accuracy of speech recognition models?

If I train a speech recognition model using data collected from N different microphones, but deploy it on an unseen (test) microphone - does it impact the accuracy of the model? While I understand that theoretically an accuracy loss is likely, does…

deep-learning voice-recognition

asked Feb 02 '18 at 15:40

baiduguy1

51
1

5

votes

1 answer

How does Wit.ai convert sentences into structured data?

The Wit.ai is a Siri-like voice interface which can can parse messages and predict the actions to perform. Here is the demo site powered by Wit.ai. How does it understand the spoken sentences and convert them into structured actionable data?…

natural-language-processing voice-recognition structured-data

asked Aug 18 '16 at 14:30

kenorb

10,423
3
43
91

4

votes

1 answer

How to detect when human voice / speech appears in an microphone stream?

I want to build a personal assistant that listens to me continuously.. The flow looks like this: continuously record voice stream it to google speech api. get back the text in real time -> parse for intent etc.. Problem is, google speech api…

intelligent-agent voice-recognition

asked Nov 11 '17 at 17:39

AIon

149
1
3

4

votes

1 answer

Is music/sound similarity comparison feasible on neural networks?

I wonder on the following concept: A given neural network gets two audio input (preferably music) and gives a real number between 0 and 1 which describes "similarity" between the second and the first track. As far as my understanding of neural…

neural-networks deep-learning pattern-recognition voice-recognition similarity

asked Aug 22 '17 at 13:45

Zoltán Schmidt

623
7
14

3

votes

1 answer

Why is the short-time Fourier transform used for preprocessing audio samples?

I've been told this is how I should be preprocessing audio samples, but what information does this method actually give me? What are the alternatives, and why shouldn't I use them?

data-preprocessing voice-recognition fourier-transform

asked Oct 02 '18 at 14:18

0x777C

133
3

3

votes

0 answers

Can computers recognise "grouping" from voice tonality?

In human communication, tonality or tonal language play many complex information, including emotions and motives. But excluding such complex aspects, tonality serves some a very basic purpose of "grouping" or "taking common" functions such as: The…

natural-language-processing voice-recognition speech-synthesis

asked Jul 16 '19 at 17:23

Always Confused

171
3

3

votes

1 answer

How to do speech recognition on a single word

I would provide a sound signal of about 2-3 seconds to my neural network. I have trained my network with a single word, like if I speak "Hello" the network may tell if "Hello" is spoken or not, but some other word like "World" is spoken, it will say…

neural-networks deep-learning voice-recognition

asked May 09 '19 at 02:59

Nimit Bhardwaj

133
3

3

votes

1 answer

How much the dialects recognition and speech recognition are relevant?

In this tutorial, they build a speech recognition model to classify a one-second audio clip as one of ten predefined words. Suppose that we modified this problem as the following: Given an Arabic dataset, we aim to build a dialects recognition model…

deep-learning natural-language-processing classification voice-recognition

asked Mar 09 '19 at 03:05

Abdulkader

43
5

2

votes

2 answers

What is easier or more efficient to summarize voice or text? [DP/RN]

If possible consider the relationship between implementation difficulty and accuracy in voice examples or simply chat conversations. And currently, what are the directions on algorithms like Deep Learning or others to solve this.

neural-networks deep-learning natural-language-processing voice-recognition text-summarization

asked Feb 24 '18 at 21:55

Eric Saboia

123
2

2

votes

3 answers

Has there been research done regarding processing speech then building a "speaker profile" based off the processed speech?

Has there been research done regarding processing speech then building a "speaker profile" based off the processed speech? Things like matching the voice with a speaker profile and matching speech patterns and wordage for the speaker profile would…

natural-language-processing reference-request speech-recognition voice-recognition

asked Nov 04 '16 at 15:20

Tory

175
6

2

votes

0 answers

State of the art in voice recognition

In the media there's lot of talk about face recognition, mainly with respect to identifying faces (= assigning to persons). Less attention is paid to the recognition of facially expressed emotions but there's a lot of research done into this…

computer-vision emotional-intelligence facial-recognition voice-recognition audio-processing

asked Jan 22 '20 at 15:11

Hans-Peter Stricker

811
1
8
20

2

votes

1 answer

AI natural voice generator

I want to create a solution, which clones my voice. I tried my commercial solutions or implementation of Tacotron. Unfortunately, results not sound natural, generated voice sounds like a robot. Anybody could recommend good alternative?

deep-learning voice-recognition

asked Nov 12 '19 at 14:35

fuwiak

143
8

1

vote

1 answer

"Vocal captcha" for robots on the phone?

With all the Google I/O stuff coming out, how can I verify that I have an actual human on the phone using only voice? Are there still vocal things humans can, but robots can't do? Conditions: the person on the phone is a stranger (so personal…

human-like voice-recognition computational-linguistics

asked May 19 '18 at 06:46

Overleaf

111
1

1

vote

1 answer

Deep audio fingerprinting for word search

Simply speaking, I'm trying to somehow search an audio clip for a list of words, and if found, I mark the time stamps. My use-case is profanity check with a list of pre-defined profane words. Is there any successfull approaches, samples, tools or…

machine-learning deep-learning natural-language-processing voice-recognition audio-processing

asked Nov 08 '19 at 17:50

Tina J

973
6
13

Questions tagged [voice-recognition]