Why instance segmentation architectures using reconstruction masks but not regression?

Question

I'm wondering why many model architectures use binary mask reconstruction for segmentational CNNs, and not regression of mask polygon coordinates? Many object detectors use regression to find coordinates of bounding boxes.

If I understand correctly, the reason is probably that the segmentation polygons have various shapes and complexities. You don't know how many points you need per polygon. Bounding boxes however are always defined by 4 coordinates. — Chillston, Jul 13 '22 at 11:17
@Chillston This is the correct answer, and should be posted as an answer so it can be upvoted and the OP can accept it. — Kroshtan, Jul 14 '22 at 13:25

score 1 · Accepted Answer · answered Jul 14 '22 at 14:28

The reason is probably that the segmentation polygons have various shapes and complexities. You don't know how many points you need per polygon so defining a proper output that specifies the polygons is not straight forward. In contrast, bounding boxes are always defined by 4 coordinates (2 coordinates for lower left and upper right corner are already enough).

Why instance segmentation architectures using reconstruction masks but not regression?

1 Answers1