1

It may already be obvious that I am just a practitioner and just a beginner to Deep Learning. I am still figuring out lots of "WHY"s and "HOW"s of DL.

So, for example, if I train a feed-forward neural network, or an image classifier with CNNs, or just an OCR problem with GRUs, using something like Keras, and it performs very poorly or takes more time to train than it should be, it may be because of the gradients getting vanished or exploding, or some other problem.

But, if it is due to the gradients getting very small or very big during the training, how do I figure that out? Doing what will I able to infer that something has happened due to the gradient values?

And what are the precautions I should take to avoid it from the beginning (since training DL models with accelerated computing costs money) and if it has happened, how do I fix it?


This question may sound like a duplicate of How to decide if gradients are vanishing?, but actually not, since that question focuses on CNNs, while I am asking about problem with gradients in all kinds of deep learning algorithms.

  • 1
    You're asking 2 distinct questions here: 1. "How do I decide if a gradient is vanishing?" (which seems to be a duplicate of the linked question) and 2. "What are the precautions I should take to avoid it from the beginning?" Can you clarify what your main question is and why do you say it's not a duplicate of the linked question? I suppose it's because your main question is question 2. Can you clarify this? – nbro Dec 13 '20 at 12:57
  • the question i tagged is more focused towards CNNs. And I wanted to learn about debugging it, from the whole deep learning perspective. and you are right that there are two distinct questions here. and my intention behind it was, if we are talking about a problem, why not also talk about how it can be solved. so, if you can help me figure out those things, please continue. – Naveen Reddy Marthala Dec 13 '20 at 13:01
  • if you think this approach of 2 questions is not correct, tell me. I will make changes accordingly. – Naveen Reddy Marthala Dec 13 '20 at 13:33
  • 1
    Your questions are valid ones. However, we already have two questions about vanishing gradients (the one you linked and [this one](https://ai.stackexchange.com/q/18132/2444)), so it seems that you're **partially** asking the same here. However, as you said, you're also asking about exploding gradients and how to solve them, in general. These questions are actually very important and good answers to them would be very useful, but I'm not sure if this post is too broad. – nbro Dec 13 '20 at 13:37

0 Answers0