Highest Voted 'audio-processing' Questions - Artificial Intelligence Stack Exchange

9

votes

1 answer

Is it possible to clean up an audio recording of a lecture using some type of AI system?

Is it possible to clean up an audio recording of a lecture from a smartphone (i.e. remove the background noise) using some type of AI system?

asked Dec 12 '18 at 15:15

Thibault Molleman

99
1
1
3

5

votes

1 answer

How can I find a specific word in an audio file?

I'm trying to train and use a neural network to detect a specific word in an audio file. The input of the neural network is an audio of 2-3 seconds duration, and the neural network must determine whether the input audio (the voice of a person)…

neural-networks machine-learning deep-learning python audio-processing

asked Aug 03 '20 at 09:28

Ali.kavari76

111
6

3

votes

2 answers

Can AI be used to reverse engineer a black box?

A while back I posted on the Reverse Engineering site about an audio DSP system whose designer had passed away and whose manufacturer no longer had source code (but the question was deleted). Basically, the audio filter settings are passed from a…

ai-design audio-processing signal-processing

asked Aug 12 '19 at 06:25

chmedly

131
2

2

votes

1 answer

Can I filter barking sounds on the television?

My dog goes bonkers every time the sound of a barking dog is heard on a television program. I never noticed this before but literally every movie or show with an outdoors setting eventually includes the sound of a barking dog. Is it possible to…

audio-processing

asked Dec 21 '18 at 12:21

AlanD

21
2

2

votes

0 answers

How to prepare audio data for deep learning?

Audio data is typically an array with the waveform represented by values from -1 to 1. There are two issues with that: if all values are inverted, e.g. -1 becomes 1 and 1 becomes -1, the audio doesn't change. But if for example I need to find…

data-preprocessing gradient audio-processing spectral-analysis

asked Feb 07 '23 at 14:10

nikishev.

21
3

2

votes

2 answers

Is it realistic to train a transformer-based model (e.g. GPT) in a self-supervised way directly on the Mel spectrogram?

In music information retrieval, one usually converts an audio signal into some kind "sequence of frequency-vectors", such as STFT or Mel-spectrogram. I'm wondering if it is a good idea to use the transformer architecture in a self-supervised manner…

transformer gpt audio-processing embeddings self-supervised-learning

asked May 24 '21 at 21:31

Peter Franek

432
1
4
11

2

votes

0 answers

Model for direct audio-to-audio speech re-encoding

There are many resources available for text-to-audio (or vice versa) synthesis, for example Google's 'Wavenet'. These tools do not allow the finer degree of control that may be required regarding the degree of inflections / tonality retained in…

audio-processing model-request speech-synthesis

asked May 19 '21 at 10:09

NeverWasMyRealName

21
1

2

votes

1 answer

I want to determine how similar a given song is to Queen's songs. Am I headed in the right direction?

I've asked this question before (@ Reddit) and people suggested CNNs on a mel spectrogram more than anything else. This is great. But I'm sort of stuck at: label some music data as "queen" and "not queen" and have this be the training set. Like,…

convolutional-neural-networks audio-processing

asked Apr 03 '21 at 05:24

Mike Johnson Jr

121
1

2

votes

1 answer

How to get more accuracy of the logistic regression model?

I am working on a Baby Crying Detection model using logistic regression. Out of $581$ audios, $222$ are of a baby crying. Each audio is of $5$ seconds. what I have done is convert each audio into numbers. and those numbers go into a .csv file. so…

regression audio-processing binary-classification logistic-regression

asked Mar 27 '21 at 17:54

Muhammad Waqar Anwar

21
1

2

votes

0 answers

How do I train a multiple-speaker model (speech synthesis) based on Tacotron 2 and espnet?

I'm new to Speech Synthesis & Deep Learning. Recently, I got a task as described below: I have problem in training a multi-speaker model which should be created by Tacotron2. And I was told I can get some ideas from espnet, which is a end-to-end…

deep-learning recurrent-neural-networks audio-processing speech-recognition speech-synthesis

asked Feb 06 '20 at 04:13

Envelo Lee

21
1

2

votes

0 answers

State of the art in voice recognition

In the media there's lot of talk about face recognition, mainly with respect to identifying faces (= assigning to persons). Less attention is paid to the recognition of facially expressed emotions but there's a lot of research done into this…

computer-vision emotional-intelligence facial-recognition voice-recognition audio-processing

asked Jan 22 '20 at 15:11

Hans-Peter Stricker

811
1
8
20

2

votes

1 answer

How to use AI for language recognition?

Given an audio track, I'm trying to find a way to recognize the audio language. Only within a small set (e.g. English vs Spanish). Is there a simple solution to detect the language in a speech?

machine-learning natural-language-processing audio-processing speech-recognition

asked Jan 02 '20 at 16:08

Tina J

973
6
13

1

vote

1 answer

How to combine input from different types of data sources?

I've to train a neural network using microphone data (wav files), accelerometer sensor data and light sensor data. Right now the approach I thought was to convert all data into images and combine them into a single image and train my neural…

neural-networks deep-learning convolutional-neural-networks ai-design audio-processing

asked Oct 26 '18 at 05:35

Aravind

113
1
5

1

vote

1 answer

Difficulty understanding Keras LSTM fitting data

I'm try to train a RNN with a chunk of audio data, where X and Y are two audio channels loaded into numpy arrays. The objective is to experiment with different NN designs to train them to transform single channel (mono) audio into a two channel…

keras long-short-term-memory audio-processing

asked Oct 12 '18 at 06:43

Dmitry

19
2

1

vote

1 answer

What type of neural network architecture allows filtering out of unwanted sounds?

I have a use case where I will be inputting audio to a model, and the output of the model will be the same audio except with certain sounds removed (volume set to zero). The dataset is generated by taking an audio file, duplicating it, and then…

convolutional-neural-networks transformer time-series audio-processing

asked Nov 29 '22 at 00:44

HonestMath

111
3

Questions tagged [audio-processing]