It may already be obvious that I am just a practitioner and just a beginner to Deep Learning. I am still figuring out lots of "WHY"s and "HOW"s of DL.
So, for example, if I train a feed-forward neural network, or an image classifier with CNNs, or just an OCR problem with GRUs, using something like Keras, and it performs very poorly or takes more time to train than it should be, it may be because of the gradients getting vanished or exploding, or some other problem.
But, if it is due to the gradients getting very small or very big during the training, how do I figure that out? Doing what will I able to infer that something has happened due to the gradient values?
And what are the precautions I should take to avoid it from the beginning (since training DL models with accelerated computing costs money) and if it has happened, how do I fix it?
This question may sound like a duplicate of How to decide if gradients are vanishing?, but actually not, since that question focuses on CNNs, while I am asking about problem with gradients in all kinds of deep learning algorithms.