0

Context

I'm investigating the UNet architecture for a little while now. After investigating the structure of the official UNet architecture as proposed in the official paper I noticed a recurrent pattern of Conv2d->BatchNorm2d->ReLU(->MaxPool2d)->Conv2d->BatchNorm2d->ReLU(->MaxPool2d) for the encoder part but I have also came across other implementations of a custom UNet where this order is different like Conv2d->BatchNorm2d->ReLU(->MaxPool2d)->BatchNorm2d->Conv2d->ReLU(->MaxPool2D) where the operation in the paranthesis is considered optional.

I've tried training both models and they both work reasonably well but I still want to understand the following:

Question

  1. Is there a "correct" order for these operations for the encoder/decoder part of the UNet architecture?
  2. What is the intuition behind shifting the order of these operations in the model?
timu vlad
  • 3
  • 1

1 Answers1

2

I suggest to follow the official U-NET implementation.

To me, the second option Conv2d -> BatchNorm2d -> ReLU (-> MaxPool2d) -> BatchNorm2d -> Conv2d -> ReLU (-> MaxPool2D) seems more a mistake that an alternative:

  • The part -> BatchNorm2d -> ReLU (-> MaxPool2d) -> BatchNorm2d -> doesn't make much sense: the first BN already centers the activation then max-pool takes only the max that doesn't introduce any substantial change in distribution, so the second BN in my opinion is not that much necessary.
  • The following part -> Conv2d -> ReLU (-> MaxPool2D), now, lacks the BN operation which should be put in between conv2d and ReLU.

So I suggest you to follow the Conv2d -> BN -> ReLU (-> MaxPool) design. At the best you can swap BN with the activation (i.e. conv -> relu -> BN): there is a debate about this regarding BN, some people think is better some other that's equivalent..

Is there a "correct" order for these operations for the encoder/decoder part of the UNet architecture?

The correct order, is the first option you specified.

What is the intuition behind shifting the order of these operations in the model?

At most it can be motivate by re-normalizing again (by means of BN) the activation or even the max-pool's output. But if so, there are two missing BNs: after the second ReLU and max-pool.

Luca Anzalone
  • 2,120
  • 2
  • 13