5

Can vanishing gradients be detected by the change in distribution (or lack thereof) of my convolution's kernel weights throughout the training epochs? And if so how?

For example, if only 25% of my kernel's weights ever change throughout the epochs, does that imply an issue with vanishing gradients?

Here are my histograms and distributions, is it possible to tell whether my model suffers from vanishing gradients from these images? (some middle hidden layers omitted for brevity)

enter image description here enter image description here enter image description here enter image description here

nbro
  • 39,006
  • 12
  • 98
  • 176
Elegant Code
  • 153
  • 1
  • 7

1 Answers1

3

Vanishing Gradients can be detected from the kernel weights distribution. All you have to look for is whether the weights are dying down to 0.

If only 25% of your kernel weights are changing that does not imply a vanishing gradient, it might be a factor, but there can be a variety of reasons, such as poor data, loss function used to the optimizer, etc. Kernel's weight not changing only points out that the model is not learning well.

From the histograms, only the conv_2d_2 layer shows any form of vanishing gradients since the numbers are pretty small. But even that seems to be picking after 600 epochs. Usually, a good indicator is to have the mean of the weights in a layer to be closer to 0 and the standard deviation to be closer to 1. So, if this is maintained to a good extent during training, you're good to go.

nbro
  • 39,006
  • 12
  • 98
  • 176
keshav
  • 56
  • 3