How to detect vanishing gradients?

Question

Can vanishing gradients be detected by the change in distribution (or lack thereof) of my convolution's kernel weights throughout the training epochs? And if so how?

For example, if only 25% of my kernel's weights ever change throughout the epochs, does that imply an issue with vanishing gradients?

Here are my histograms and distributions, is it possible to tell whether my model suffers from vanishing gradients from these images? (some middle hidden layers omitted for brevity)

What did you do to visualize weights during training, please? — Avv, Mar 04 '23 at 19:59
@Avv I used Tensorboard, and went to the histograms tab. Please google a good Tensorboard tutorial. — Elegant Code, Mar 09 '23 at 15:39

score 3 · Accepted Answer · edited Dec 13 '20 at 13:23

Vanishing Gradients can be detected from the kernel weights distribution. All you have to look for is whether the weights are dying down to 0.

If only 25% of your kernel weights are changing that does not imply a vanishing gradient, it might be a factor, but there can be a variety of reasons, such as poor data, loss function used to the optimizer, etc. Kernel's weight not changing only points out that the model is not learning well.

From the histograms, only the conv_2d_2 layer shows any form of vanishing gradients since the numbers are pretty small. But even that seems to be picking after 600 epochs. Usually, a good indicator is to have the mean of the weights in a layer to be closer to 0 and the standard deviation to be closer to 1. So, if this is maintained to a good extent during training, you're good to go.

how I can visualize weights during training, please as @Elegant Code did? — Avv, Mar 04 '23 at 19:59

How to detect vanishing gradients?

1 Answers1

Linked