When should you not use the bias in a layer?

Question

I'm not really that experienced with deep learning, and I've been looking at research code (mostly PyTorch) for deep neural networks, specifically GANs, and, in many cases, I see the authors setting bias=False in some layers without much justification. This isn't usually done in a long stack of layers that have a similar purpose, but mostly in unique layers like the initial linear layer after a conditioning vector, or certain layers in an attention architecture.

I imagined there must be a strategy to this, but most articles online seem to confirm my initial perception that bias is a good thing to have available for training in pretty much every layer.

Is there a specific optimization / theoretical reason to turn off biases in specific layers in a network? How can I choose when to do it when designing my own architecture?

Hello. To have more context, I suggest that you include the link to 1-2 examples of models that do not use the bias. — nbro, May 10 '21 at 11:45

score 5 · Accepted Answer · answered May 10 '21 at 23:06

The most usual case of bias=False is in layers before/after Batch Normalization with no activators in between. The BatchNorm layer will re-center the data anyway, removing the bias and making it a useless trainable parameter. Quoting the original BatchNorm paper:

Note that, since we normalize $Wu+b$, the bias $b$ can be ignored since its effect will be canceled by the subsequent mean subtraction

Similar thing happens in transformers' LinearNormalization and (as far as I understand how conditioning works) in the GANs' conditioning layer - the data gets re-centered, effectively cancelling the bias.

In my experience, that's the most frequent reason to see bias=False, but one can imagine other reasons to remove the bias. As a rule of thumb, I'd say that you don't include bias if you want to "transform zeros to zeros" - things like learned rotations can be an example of such (rather exotic) application.

When should you not use the bias in a layer?

1 Answers1