0

We generally encounter the following statement several times

The input vector is first fed into a fully connected layer......

Since linear activation functions, such as identity function, can so considered as an activation functions, a fully connected layer can be considered just as an Affine transformation if the fully connected layer uses linear activation function.

So, in theory, a fully connected layer can refer to the following

  1. Just an affine transformation
  2. Affine transformation followed by a nonlinear activation function

Do authors generally choose to use "fully connected layer" for case 2 only or for both cases 1 and 2?

hanugm
  • 3,571
  • 3
  • 18
  • 50
  • 1
    Affine transformations have this property that multiple affine transformations are equivalent to one affine transformation, so you gain nothing by stacking such layers. – user253751 Sep 16 '21 at 08:06
  • @user253751 consider a single one. – hanugm Sep 16 '21 at 08:23
  • 1
    According to my experience, it refers to the 2nd case. When you're talking of neural networks, it's implicitly assumed you use a non-linear activation function at the end of each layer. – SpiderRico Sep 16 '21 at 20:22

1 Answers1

1

Yes, typically, a fully connected layer is an affine transformation, which can or not be followed by a non-linear activation function, but, in many (if not most) cases, it's followed by a non-linearity, such as ReLU, sigmoid, or tanh (an exception is when you do regression), which is what makes the neural network be able to approximate non-linear/complicated functions.

nbro
  • 39,006
  • 12
  • 98
  • 176