Questions tagged [voice-recognition]

For questions about the AI/ML algorithms for performing the task of voice recognition in any animal (falls under the broad category of sound recognition, not to be confused with speech recognition).

Voice or speaker recognition is the ability of a machine or program to receive and interpret or to understand and carry out spoken commands.

6 basic animal groups

18 questions
28
votes
8 answers

Is there any research on the development of attacks against artificial intelligence systems?

Is there any research on the development of attacks against artificial intelligence systems? For example, is there a way to generate a letter "A", which every human being in this world can recognize but, if it is shown to the state-of-the-art…
5
votes
2 answers

Can variations in microphones used in training set and test set impact the accuracy of speech recognition models?

If I train a speech recognition model using data collected from N different microphones, but deploy it on an unseen (test) microphone - does it impact the accuracy of the model? While I understand that theoretically an accuracy loss is likely, does…
baiduguy1
  • 51
  • 1
5
votes
1 answer

How does Wit.ai convert sentences into structured data?

The Wit.ai is a Siri-like voice interface which can can parse messages and predict the actions to perform. Here is the demo site powered by Wit.ai. How does it understand the spoken sentences and convert them into structured actionable data?…
kenorb
  • 10,423
  • 3
  • 43
  • 91
4
votes
1 answer

How to detect when human voice / speech appears in an microphone stream?

I want to build a personal assistant that listens to me continuously.. The flow looks like this: continuously record voice stream it to google speech api. get back the text in real time -> parse for intent etc.. Problem is, google speech api…
AIon
  • 149
  • 1
  • 3
4
votes
1 answer

Is music/sound similarity comparison feasible on neural networks?

I wonder on the following concept: A given neural network gets two audio input (preferably music) and gives a real number between 0 and 1 which describes "similarity" between the second and the first track. As far as my understanding of neural…
3
votes
1 answer

Why is the short-time Fourier transform used for preprocessing audio samples?

I've been told this is how I should be preprocessing audio samples, but what information does this method actually give me? What are the alternatives, and why shouldn't I use them?
3
votes
0 answers

Can computers recognise "grouping" from voice tonality?

In human communication, tonality or tonal language play many complex information, including emotions and motives. But excluding such complex aspects, tonality serves some a very basic purpose of "grouping" or "taking common" functions such as: The…
3
votes
1 answer

How to do speech recognition on a single word

I would provide a sound signal of about 2-3 seconds to my neural network. I have trained my network with a single word, like if I speak "Hello" the network may tell if "Hello" is spoken or not, but some other word like "World" is spoken, it will say…
3
votes
1 answer

How much the dialects recognition and speech recognition are relevant?

In this tutorial, they build a speech recognition model to classify a one-second audio clip as one of ten predefined words. Suppose that we modified this problem as the following: Given an Arabic dataset, we aim to build a dialects recognition model…
2
votes
2 answers

What is easier or more efficient to summarize voice or text? [DP/RN]

If possible consider the relationship between implementation difficulty and accuracy in voice examples or simply chat conversations. And currently, what are the directions on algorithms like Deep Learning or others to solve this.
2
votes
3 answers

Has there been research done regarding processing speech then building a "speaker profile" based off the processed speech?

Has there been research done regarding processing speech then building a "speaker profile" based off the processed speech? Things like matching the voice with a speaker profile and matching speech patterns and wordage for the speaker profile would…
2
votes
0 answers

State of the art in voice recognition

In the media there's lot of talk about face recognition, mainly with respect to identifying faces (= assigning to persons). Less attention is paid to the recognition of facially expressed emotions but there's a lot of research done into this…
2
votes
1 answer

AI natural voice generator

I want to create a solution, which clones my voice. I tried my commercial solutions or implementation of Tacotron. Unfortunately, results not sound natural, generated voice sounds like a robot. Anybody could recommend good alternative?
fuwiak
  • 143
  • 8
1
vote
1 answer

"Vocal captcha" for robots on the phone?

With all the Google I/O stuff coming out, how can I verify that I have an actual human on the phone using only voice? Are there still vocal things humans can, but robots can't do? Conditions: the person on the phone is a stranger (so personal…
1
vote
1 answer

Deep audio fingerprinting for word search

Simply speaking, I'm trying to somehow search an audio clip for a list of words, and if found, I mark the time stamps. My use-case is profanity check with a list of pre-defined profane words. Is there any successfull approaches, samples, tools or…
1
2