Why will the sigmoid function be 1 (and 0), if we use a fully connected layer that produces a big enough positive (or negative, respectively) output?

Question

I am using a fully connected neural network that uses a sigmoid activation function. If we feed a big enough input, the sigmoid function will finally become 1 or 0. Is there any solution to avoid this?

Will this lead to classical sigmoid problems vanishing gradient or exploding gradient?

Increase precision to 256 bit if you are really desperate to use sigmoid. Else use different activation. — , Sep 11 '19 at 14:54
I think I am stuck with the use of sigmoid(since I am looking for a probability ) but I wonder how to increase the precision to 256 what do you mean by that ? — ou2105, Sep 11 '19 at 15:15
Normal libraries use 64 bit precision for calculations. But if it is required you can increase precision thus avoiding a complete 0 gradient at the trade-off for memory and speed. But if you need probablity use softmax instead, and in the mid layers use relu. — , Sep 11 '19 at 15:55
The other problem is when using sigmoid in an FNN with more than 1 hidden layer is that backpropagation degrades and weights in the first layer will only be updated by small values that might not have a significant impact. Relu fixes this problem. I agree with @DuttaA use softmax if you want to get a probability. — SandMan, Sep 11 '19 at 17:01
Can you elaborate on the task you're intending to solve with your network? As others have already said, ReLUs are the best option in intermediate layers and it's probable that you could do with logit (pre-sigmoid) outputs instead of going through the sigmoid for backprop purposes. — David, Sep 11 '19 at 17:43

score 1 · Answer 1 · edited Dec 21 '21 at 10:39

1

In general, it's better to not use the sigmoid function in any hidden layer. There are many other great options such as ReLU and ELU. However, if for any reason you have to use a sigmoid-like function, then go with the Tanh function, at least it has ~0 mean.

edited Dec 21 '21 at 10:39

Shayan Shafiq

350
1
4
12

answered Jul 28 '20 at 15:28

pedrum

313
1
13

Why will the sigmoid function be 1 (and 0), if we use a fully connected layer that produces a big enough positive (or negative, respectively) output?

1 Answers1