Context
I'm investigating the UNet architecture for a little while now. After investigating the structure of the official UNet architecture as proposed in the official paper I noticed a recurrent pattern of Conv2d->BatchNorm2d->ReLU(->MaxPool2d)->Conv2d->BatchNorm2d->ReLU(->MaxPool2d) for the encoder part but I have also came across other implementations of a custom UNet where this order is different like Conv2d->BatchNorm2d->ReLU(->MaxPool2d)->BatchNorm2d->Conv2d->ReLU(->MaxPool2D) where the operation in the paranthesis is considered optional.
I've tried training both models and they both work reasonably well but I still want to understand the following:
Question
- Is there a "correct" order for these operations for the encoder/decoder part of the UNet architecture?
- What is the intuition behind shifting the order of these operations in the model?