Do authors generally use fully connected layer instead of affine transformation?

Question

We generally encounter the following statement several times

The input vector is first fed into a fully connected layer......

Since linear activation functions, such as identity function, can so considered as an activation functions, a fully connected layer can be considered just as an Affine transformation if the fully connected layer uses linear activation function.

So, in theory, a fully connected layer can refer to the following

Just an affine transformation
Affine transformation followed by a nonlinear activation function

Do authors generally choose to use "fully connected layer" for case 2 only or for both cases 1 and 2?

Affine transformations have this property that multiple affine transformations are equivalent to one affine transformation, so you gain nothing by stacking such layers. — user253751, Sep 16 '21 at 08:06
According to my experience, it refers to the 2nd case. When you're talking of neural networks, it's implicitly assumed you use a non-linear activation function at the end of each layer. — SpiderRico, Sep 16 '21 at 20:22

score 1 · Accepted Answer · answered Sep 20 '21 at 00:20

Yes, typically, a fully connected layer is an affine transformation, which can or not be followed by a non-linear activation function, but, in many (if not most) cases, it's followed by a non-linearity, such as ReLU, sigmoid, or tanh (an exception is when you do regression), which is what makes the neural network be able to approximate non-linear/complicated functions.

Do authors generally use fully connected layer instead of affine transformation?

1 Answers1