I modified resnet50 architecture to get a regression network. I just add batchnorm1d and ReLU layers just before the fully connected layer. During the training, the output of batchnorm1d layer is nearly equal to 3 and this gives good results for training. However, during inference, output of batchnorm1d layer is about 30 so this leads to too low accuracy for test results. In other words, outputs of batchnorm1d layers give very different normalized output during training and inference.
What is the reason for this situation, and how can I solve it? I am using PyTorch.