How can equivariance to translation be a benefit of a CNN?

Question

I just learnt about the properties of equivariance and invariance to translation and other transformations. Being invariant to translation is clearly an advantage, as even if the input gets shifted, the network will still learn the same features, and work fine. But how is equivariance useful?

I think your question is mostly about what the difference between invariance and equivariance is. If invariance is clearly an advantage, imo equivariance is also clear. Can you explain what isn't clear to you? Invariance means output does NOT change with inputs. Equivariance means output changes equivalently with the input. Depending on your application (image classification versus image segmentation) you will want one or the other. — Frank Bryce, May 14 '22 at 01:33

score 2 · Answer 1 · answered Aug 16 '21 at 18:55

Equivariance is useful because the neural network can learn to detect common image components - edges, corners, curves in specific orientations - in a general way that is then applied across a whole image evenly. These components typically do exist and can appear in multiple places within an image, and may be parts of larger-scale features in turn. Identifying all the edges in an image can be useful before any kind of pooling that adds invariance is applied.

Without equivariance, an edge oriented in one way in one part of the image would be completely different to the neural network to the same kind of edge elsewhere. It would only get made part of a relevant filter and used if a specific training example had an important edge in that one place.

It is hard to separate this usefulness of equivariance from the associated reduction in number of free parameters, thanks to using convolutional filters with small amount of local connection as opposed to fully connected neural networks. CNNs have to be equivariant due to the architecture, whilst the invariance requires a little more effort (pooling and/or strided convolutions).

The fact that CNNs are so successful using this approach probably says something about natural images. It should be possible to construct non-natural images where equivariance would be of limited use e.g. where local features don't exist, or where they vary over the image in a way that makes detecting them in more than a few places pointless.

score 1 · Answer 2 · answered Aug 16 '21 at 22:38

As explained here, both properties are useful depending on your application and expected result.

For an image classifier, you'll expect a invariance (in-variance = not change) result, meaning all results are the same, no matter how you translate the image.
For an image segmentation, or an object detector, on the other hand, you'll expect the output to shift together as the input varies. In other words, an equivariance (equi-variance = same change) is expected.

How can equivariance to translation be a benefit of a CNN?

2 Answers2