Why shouldn't batch normalisation layers be learnable during fine-tuning?

Asked Dec 13 '20 at 11:15

Active Dec 13 '20 at 13:06

Viewed 113 times

I have been reading this TensorFlow tutorial on transfer learning, where they unfroze the whole model and then they say:

When you unfreeze a model that contains BatchNormalization layers in order to do fine-tuning, you should keep the BatchNormalization layers in inference mode by passing training=False when calling the base model. Otherwise the updates applied to the non-trainable weights will suddenly destroy what the model has learned.

My question is: why? The model's weights are adapting to the new data, so why do we keep the old mean and variance, which was calculated on ImageNet? This is very confusing.

edited Dec 13 '20 at 13:06

nbro

39,006
12
98
176

asked Dec 13 '20 at 11:15

dato nefaridze

Why shouldn't batch normalisation layers be learnable during fine-tuning?

0 Answers0