Why are the learned offsets of anchor boxes in the RCNN object detection models scale invariant?

Asked Jan 15 '22 at 16:10

Active Jan 15 '22 at 16:10

Viewed 118 times

In the original RCNN paper (https://arxiv.org/pdf/1311.2524.pdf) and continued in later RCNN papers such as faster RCNN (https://arxiv.org/pdf/1506.01497.pdf) the learned offsets of the anchor boxes are scale-invariant. For example the learned x-center offset $d_{x}$:

$d_{x} = (x-x_{a})/w_{a}$

is meant to parameterize the difference between the x-center of the anchor box and the new predicted box. However, it is also scaled by the width $w_{a}$ to make it scale invariant. Why is it important that this is scale invariant? Does it just make it easier for the neural network to learn, similar to something like batchnorm?

asked Jan 15 '22 at 16:10

phil

Why are the learned offsets of anchor boxes in the RCNN object detection models scale invariant?

0 Answers0