1

In the original RCNN paper (https://arxiv.org/pdf/1311.2524.pdf) and continued in later RCNN papers such as faster RCNN (https://arxiv.org/pdf/1506.01497.pdf) the learned offsets of the anchor boxes are scale-invariant. For example the learned x-center offset $d_{x}$:

$d_{x} = (x-x_{a})/w_{a}$

is meant to parameterize the difference between the x-center of the anchor box and the new predicted box. However, it is also scaled by the width $w_{a}$ to make it scale invariant. Why is it important that this is scale invariant? Does it just make it easier for the neural network to learn, similar to something like batchnorm?

phil
  • 143
  • 4

0 Answers0