In the original RCNN paper (https://arxiv.org/pdf/1311.2524.pdf) and continued in later RCNN papers such as faster RCNN (https://arxiv.org/pdf/1506.01497.pdf) the learned offsets of the anchor boxes are scale-invariant. For example the learned x-center offset $d_{x}$:
$d_{x} = (x-x_{a})/w_{a}$
is meant to parameterize the difference between the x-center of the anchor box and the new predicted box. However, it is also scaled by the width $w_{a}$ to make it scale invariant. Why is it important that this is scale invariant? Does it just make it easier for the neural network to learn, similar to something like batchnorm?