I'm wondering why many model architectures use binary mask reconstruction for segmentational CNNs, and not regression of mask polygon coordinates? Many object detectors use regression to find coordinates of bounding boxes.
Asked
Active
Viewed 29 times
0

Kroshtan
- 239
- 1
- 10

Dmitry Sokolov
- 103
- 6
-
1If I understand correctly, the reason is probably that the segmentation polygons have various shapes and complexities. You don't know how many points you need per polygon. Bounding boxes however are always defined by 4 coordinates. – Chillston Jul 13 '22 at 11:17
-
1@Chillston This is the correct answer, and should be posted as an answer so it can be upvoted and the OP can accept it. – Kroshtan Jul 14 '22 at 13:25
-
You are right, thank you - I put it as an answer – Chillston Jul 14 '22 at 14:28
1 Answers
1
The reason is probably that the segmentation polygons have various shapes and complexities. You don't know how many points you need per polygon so defining a proper output that specifies the polygons is not straight forward. In contrast, bounding boxes are always defined by 4 coordinates (2 coordinates for lower left and upper right corner are already enough).

Chillston
- 1,501
- 5
- 11