2D tasks enjoy a vast backing of successful models that can be reused.
For convolutions, can one simply replace 2D operations with 3D counterparts and inherit their benefits? Any 'extra steps' to improve the transition? Not interested in unrolling the 3D input along channels.
Publication/repository references help.
Details
The input is an STFT-like transform of multi-channel EEG timeseries, except it's 3D. There's spatial dependency to exploit across all three dimensions. The full transform is 4D, but one of the dimensions is unrolled along channels, so data and transform channels are mixed. The transform itself has stacked complex convolutions and modulus nonlinearities (output is real).
I seek to reuse models like SEResNet, InceptionNet, and EfficientNet. The "benefits" of interest are mainly train viability (convergence speed, amount of required tuning) and generalization (test accuracy) - or alternative to latter, that blocks don't interact harmfully by e.g. assuming an inherently 2D structure.