I'm not really that experienced with deep learning, and I've been looking at research code (mostly PyTorch) for deep neural networks, specifically GANs, and, in many cases, I see the authors setting bias=False
in some layers without much justification. This isn't usually done in a long stack of layers that have a similar purpose, but mostly in unique layers like the initial linear layer after a conditioning vector, or certain layers in an attention architecture.
I imagined there must be a strategy to this, but most articles online seem to confirm my initial perception that bias is a good thing to have available for training in pretty much every layer.
Is there a specific optimization / theoretical reason to turn off biases in specific layers in a network? How can I choose when to do it when designing my own architecture?