Why is the short-time Fourier transform used for preprocessing audio samples?

Question

I've been told this is how I should be preprocessing audio samples, but what information does this method actually give me? What are the alternatives, and why shouldn't I use them?

OK, I just looked it up. https://ccrma.stanford.edu/~jos/sasp/Mathematical_Definition_STFT.html . . . it seems STFT includes a sliding window function. Probably you would implement STFT using FFT at a lower-level. I was just suspicious that something had been "lost in translation" with "Short-Time" having very similar meaning to "Fast" :-) — Neil Slater, Oct 02 '18 at 16:11

score 2 · Accepted Answer · answered Oct 03 '18 at 05:39

Fourier transform is used to transform audio data to get more information (features).

For example, raw audio data usually represented by a one-dimensional array, x[n], which has a length n (number of samples). x[i] is an amplitude value of the i-th sample point.

Using the Fourier transform, your audio data will be represented as a two-dimensional array. Now, x[i] is a not a single value of amplitude, but a list of frequencies which compose original value at the i-th frame (a frame consists of a few samples).

See the image below (from wikipedia), the red graph is an original value of n samples before transformed, and the blue graph is a transformed value of one frame.

Why is the short-time Fourier transform used for preprocessing audio samples?

1 Answers1