We generally encounter the following statement several times
The input vector is first fed into a fully connected layer......
Since linear activation functions, such as identity function, can so considered as an activation functions, a fully connected layer can be considered just as an Affine transformation if the fully connected layer uses linear activation function.
So, in theory, a fully connected layer can refer to the following
- Just an affine transformation
- Affine transformation followed by a nonlinear activation function
Do authors generally choose to use "fully connected layer" for case 2 only or for both cases 1 and 2?