My dataset consists of about 40,000 200x200px grayscale images of centered blobs bathed in noise and occasional artifacts like stripes other blobs of different shapes and sizes, fuzzy speckles and so on in their neighborhood. They are used in a binary classification problem, with emphasis on recall.
I read that using FFT of image and FFT of the convolutional kernel and multiplying the two, produces a similar result as convolutions would but at a way lower resource expense. This is probably the most straightforward article I found if you need a more detailed description(https://medium.com/analytics-vidhya/fast-cnn-substitution-of-convolution-layers-with-fft-layers-a9ed3bfdc99a)
What I want to do however is simply feed the FFT of images to the standard CNN. The reasoning being, maybe it would be easier for the network to catch on to features that it would miss or tend to weigh less. Or in other words, FFT as a feature engineering technique.
Would this be an idea worth trying to pursue? If so, any suggestion on which FFT components to extract (Amplitude/Phase, Real/Imaginary)?