1

I am training a CNN with a batch size of 128, but I have some fluctuations in the validation loss, which are greater than one. I want to increase my batch size to 150 or 200, but, in the code examples I have come across, the batch size is always something like 32, 64, 128, or 256. Is it a rule? Can I use other values for it?

nbro
  • 39,006
  • 12
  • 98
  • 176
SahaTib
  • 140
  • 1
  • 9

1 Answers1

1

Data that has a size of a factor of $2$ (aka $2^n$ for some integer $n$) allows for easier memory management because data can be organized contiguously (without gaps). This allows for faster memory reading and thus faster iteration time in general. From a computational point of view, this is important as it can be taken advantage of by the compiler and speed up iteration loops. This is why batch sizes are chosen as such in practice. However, this does not necessarily imply better training results.

In regards to your question of "Can I use other values for batch size?":

Yes, you can use different values, and, for the most part, you probably won't see the difference in computational performance because of the speed of modern training APIs. So, until training a large model with a lot of compute where this optimization has a larger impact, feel free to experiment :)

nbro
  • 39,006
  • 12
  • 98
  • 176
Jaden Travnik
  • 3,767
  • 1
  • 16
  • 35