2

I am using a fully connected neural network that uses a sigmoid activation function. If we feed a big enough input, the sigmoid function will finally become 1 or 0. Is there any solution to avoid this?

Will this lead to classical sigmoid problems vanishing gradient or exploding gradient?

nbro
  • 39,006
  • 12
  • 98
  • 176
ou2105
  • 121
  • 2
  • Increase precision to 256 bit if you are really desperate to use sigmoid. Else use different activation. –  Sep 11 '19 at 14:54
  • I think I am stuck with the use of sigmoid(since I am looking for a probability ) but I wonder how to increase the precision to 256 what do you mean by that ? – ou2105 Sep 11 '19 at 15:15
  • 1
    Normal libraries use 64 bit precision for calculations. But if it is required you can increase precision thus avoiding a complete 0 gradient at the trade-off for memory and speed. But if you need probablity use softmax instead, and in the mid layers use relu. –  Sep 11 '19 at 15:55
  • The other problem is when using sigmoid in an FNN with more than 1 hidden layer is that backpropagation degrades and weights in the first layer will only be updated by small values that might not have a significant impact. Relu fixes this problem. I agree with @DuttaA use softmax if you want to get a probability. – SandMan Sep 11 '19 at 17:01
  • 1
    Can you elaborate on the task you're intending to solve with your network? As others have already said, ReLUs are the best option in intermediate layers and it's probable that you could do with logit (pre-sigmoid) outputs instead of going through the sigmoid for backprop purposes. – David Sep 11 '19 at 17:43

1 Answers1

1

In general, it's better to not use the sigmoid function in any hidden layer. There are many other great options such as ReLU and ELU. However, if for any reason you have to use a sigmoid-like function, then go with the Tanh function, at least it has ~0 mean.

Shayan Shafiq
  • 350
  • 1
  • 4
  • 12
pedrum
  • 313
  • 1
  • 13