I am using a fully connected neural network that uses a sigmoid activation function. If we feed a big enough input, the sigmoid function will finally become 1 or 0. Is there any solution to avoid this?
Will this lead to classical sigmoid problems vanishing gradient or exploding gradient?