2

I have read this post: How to choose an activation function?.

There is enough literature about activation functions, but when should I use a linear activation instead of ReLU?

What does the author mean with ReLU when I'm dealing with positive values, and a linear function when I'm dealing with general values.?

Is there a more detail answer to this?

nbro
  • 39,006
  • 12
  • 98
  • 176
jennifer ruurs
  • 579
  • 2
  • 8

2 Answers2

2

The activation function you choose depends on the application you are building/data that you have got to work with. It is hard to recommend one over the other, without taking this into account.

Here is a short-summary of the advantages and disadvantages of some common activation functions: https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/

What does the author mean with ReLU when I'm dealing with positive values, and a linear function when I'm dealing with general values.

ReLU is good for inputs > 0, since ReLU = 0 if input < 0(which would kill the neuron, if the gradient is = 0)

To remedy this, you could look into using a Leaky-ReLU instead. (Which avoids killing the neuron by returning a non-zero value in the cases of input <= 0)

Krrrl
  • 211
  • 1
  • 10
  • This solved a big part of my question, but would but would there be a case that i specifically would use a linear activation instead of relu? – jennifer ruurs Oct 22 '19 at 06:24
  • 1
    That depends again on your specific application. There are two major drawbacks of linear activation functions: 1.You cant use back-propagation in training(since the derivative is a constant, it does not convey which weight influenced the input the most). 2.Linear activation functions are only applicable to shallow networks, since the derivative of a linear function is a constant(multiple layers of linear functions is just another linear function). So, in the case that you have a shallow network - that does not rely on backpropagation - then you can use a linear activation function. – Krrrl Oct 22 '19 at 13:38
  • 1
    I think it is fair to say that you can think of the linear activation function as an artifact of the earlier stages of neural network development - when work was done on single/few perceptrons, rather than larger layered networks. – Krrrl Oct 22 '19 at 13:39
1

Nothing is written on stone in here, but as a rule of thumb linear activation is not very common. A linear activation function in a hidden layer can collapse more neurons in more layers. Linear activation can be implemented in the last layer if a scale of the outputs is not used. (This is the most common use I have seen.)