10

I'm not really that experienced with deep learning, and I've been looking at research code (mostly PyTorch) for deep neural networks, specifically GANs, and, in many cases, I see the authors setting bias=False in some layers without much justification. This isn't usually done in a long stack of layers that have a similar purpose, but mostly in unique layers like the initial linear layer after a conditioning vector, or certain layers in an attention architecture.

I imagined there must be a strategy to this, but most articles online seem to confirm my initial perception that bias is a good thing to have available for training in pretty much every layer.

Is there a specific optimization / theoretical reason to turn off biases in specific layers in a network? How can I choose when to do it when designing my own architecture?

nbro
  • 39,006
  • 12
  • 98
  • 176
Nikos Tsakas
  • 113
  • 1
  • 8
  • Hello. To have more context, I suggest that you include the link to 1-2 examples of models that do not use the bias. – nbro May 10 '21 at 11:45

1 Answers1

5

The most usual case of bias=False is in layers before/after Batch Normalization with no activators in between. The BatchNorm layer will re-center the data anyway, removing the bias and making it a useless trainable parameter. Quoting the original BatchNorm paper:

Note that, since we normalize $Wu+b$, the bias $b$ can be ignored since its effect will be canceled by the subsequent mean subtraction

Similar thing happens in transformers' LinearNormalization and (as far as I understand how conditioning works) in the GANs' conditioning layer - the data gets re-centered, effectively cancelling the bias.

In my experience, that's the most frequent reason to see bias=False, but one can imagine other reasons to remove the bias. As a rule of thumb, I'd say that you don't include bias if you want to "transform zeros to zeros" - things like learned rotations can be an example of such (rather exotic) application.

Kostya
  • 2,416
  • 7
  • 23