Batch Normalization Layer is not learning the data semantics of a dataset comprised of datasets from different sources

Question

I have built a dataset for image segmentation that is comprised of datasets from several different sources. Almost all of my models have problems with learning the correct parameters of the batchnormalization layer, the networks are very deep and it helps a lot keeping them, but if I put them into evaluation mode (i.e. not compute the batch normalization of the input but the learned one) there is a huge drop of performance.

I presume the statistics of the different datasets source images are very different. The labels might also contain different amount of noise. I have tried normalizing the images to standard channel mean and standard deviation, centered around 0 and 0.5.

What else could I try?

Many thanks in advance for your advice and insight!

requested snippet:

class ContractingBlock(nn.Module):
    '''
    ContractingBlock Class
    Performs two convolutions followed by a max pool operation.
    Values:
        input_channels: the number of channels to expect from a given input
    '''
    def __init__(self, input_channels, use_dropout=False, use_bn=True):
        super(ContractingBlock, self).__init__()
        self.conv1 = nn.Conv2d(input_channels, input_channels * 2, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(input_channels * 2, input_channels * 2, kernel_size=3, padding=1)
        self.activation = nn.LeakyReLU(0.2)
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        if use_bn:
            self.batchnorm = nn.BatchNorm2d(input_channels * 2)
        self.use_bn = use_bn
        if use_dropout:
            self.dropout = nn.Dropout()
        self.use_dropout = use_dropout

    def forward(self, x):
        '''
        Function for completing a forward pass of ContractingBlock: 
        Given an image tensor, completes a contracting block and returns the transformed tensor.
        Parameters:
            x: image tensor of shape (batch size, channels, height, width)
        '''
        x = self.conv1(x)
        if self.use_bn:
            x = self.batchnorm(x)
        if self.use_dropout:
            x = self.dropout(x)
        x = self.activation(x)
        x = self.conv2(x)
        if self.use_bn:
            x = self.batchnorm(x)
        if self.use_dropout:
            x = self.dropout(x)
        x = self.activation(x)
        x = self.maxpool(x)
        return x

class ExpandingBlock(nn.Module):
    '''
    ExpandingBlock Class:
    Performs an upsampling, a convolution, a concatenation of its two inputs,
    followed by two more convolutions with optional dropout
    Values:
        input_channels: the number of channels to expect from a given input
    '''
    def __init__(self, input_channels, use_dropout=False, use_bn=True):
        super(ExpandingBlock, self).__init__()
        self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
        self.conv1 = nn.Conv2d(input_channels, input_channels // 2, kernel_size=2)
        self.conv2 = nn.Conv2d(input_channels, input_channels // 2, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(input_channels // 2, input_channels // 2, kernel_size=2, padding=1)
        if use_bn:
            self.batchnorm = nn.BatchNorm2d(input_channels // 2)
        self.use_bn = use_bn
        self.activation = nn.ReLU()
        if use_dropout:
            self.dropout = nn.Dropout()
        self.use_dropout = use_dropout

    def forward(self, x, skip_con_x):
        '''
        Function for completing a forward pass of ExpandingBlock: 
        Given an image tensor, completes an expanding block and returns the transformed tensor.
        Parameters:
            x: image tensor of shape (batch size, channels, height, width)
            skip_con_x: the image tensor from the contracting path (from the opposing block of x)
                    for the skip connection
        '''
        x = self.upsample(x)
        x = self.conv1(x)
        skip_con_x = crop(skip_con_x, x.shape)
        x = torch.cat([x, skip_con_x], axis=1)
        x = self.conv2(x)
        if self.use_bn:
            x = self.batchnorm(x)
        if self.use_dropout:
            x = self.dropout(x)
        x = self.activation(x)
        x = self.conv3(x)
        if self.use_bn:
            x = self.batchnorm(x)
        if self.use_dropout:
            x = self.dropout(x)
        x = self.activation(x)
        return x

```

Have you tried a different normalization layer? Also the model is a standard one (e.g. taken from literature) or is just made by you? — Luca Anzalone, Jul 18 '23 at 13:37
Thanks for commentin. Well it's a UNet architecture, it's made by me, I've tested it on different datasets on similar tasks (segmentation) and it works. I've tried layernormalization, which works, as in it is consistent, however the performance is much worse than the batchnormalization in training mode. — user199590, Jul 18 '23 at 17:19
Alright. So, can you check the statistics of the train and test sets? — Luca Anzalone, Jul 19 '23 at 14:25
@LucaAnzalone yes I can, they are quite different, so I tried standardizing them, are there other better techniques? — user199590, Jul 19 '23 at 18:57
I understood you have a dataset made of different datasets, and that standardization is fine. What I mean is that if the overall statistics (i.e., mean and variance over all datasets) of the train are different from the test set, then BN will cause the model to perform poorly. So, if you keep updating the BN's parameters during test, the BN layers would adapt to that statistics improving performance although introducing a dependency on the test, which is unwanted. Therefore, do you split each dataset randomly to determine the test and train? — Luca Anzalone, Jul 20 '23 at 13:31
@LucaAnzalone I, see what you mean. Yes this makes sense. So, no, the training dataset is a collection of different datasets, and my test and evaluation dataset are a small subset of the small dataset I want to perform on. So if I have Datasets $D_1, D_2, D*$, I split $D*$ into train, test and eval. And I create a training set: $D_1 + D_2 + D*_{train}$ and I evaluate and test respectively on the $D*$ splits. If the parameters are fixed from the training set, they basically predict all 0's on the eval/test of D*, if in train mode, i.e. normalize to the batch of test, it works very well. — user199590, Jul 21 '23 at 05:49
Is $D_*$ much smaller than $D_1 + D_2$? And also, just to check, when you standardize the data do you consider the overall stats of $D_1+D_2+D_{*train}$ or standardize each $D_i$ individually? — Luca Anzalone, Jul 22 '23 at 13:10
@LucaAnzalone thanks. I standardize just to either zero mean and standard deviation or (0.5) mean and standard deviation for all across each channel. $D_*$ is much much smaller about 200x so. — user199590, Jul 22 '23 at 17:34
Ok, so it can be an imbalance problem since $D_*$ is much smaller. Have you tried data augmentation only on $D_*$ and/or oversampling it? — Luca Anzalone, Jul 23 '23 at 09:54
Thanks @LucaAnzalone yes, I'm oversampling it such that $D_*$ is about as large as D_1 + D_2, I also tried smaller multiples. I use augmentation as well. — user199590, Jul 23 '23 at 11:21
It seems that you're doing right.. maybe there is a bug in your code. Could you post some relevant snippets? — Luca Anzalone, Jul 26 '23 at 13:20
@LucaAnzalone Sure, I've edited my post to include the snippet. — user199590, Jul 30 '23 at 10:57
Thanks. I notice that you use the same BN layer multiple times in the `forward()` functions. Can you try to define multiple BN layers and use each of them only once? I think the same layer to receive different statistics and may get confused during training. — Luca Anzalone, Jul 30 '23 at 14:37

Batch Normalization Layer is not learning the data semantics of a dataset comprised of datasets from different sources

0 Answers0