How are Ground truth provided to each Pyramid map in RetinaNet or YOLOv3 Paper? How is the mapping of Feature Pyramids done to Ground Truth

Asked Jan 31 '21 at 11:14

Active Jan 31 '21 at 11:14

Viewed 60 times

SO the YOLO V3 and RetinaNet both uses the Feature pyramids which look something like this: (except b and e which have one output)

I'm just confuse how the predictions and training is done? Do we have to give EACH feature map a different Y label? IF yes, how is that possible? We need to have N different ground truth in my opinion. (Also ther'll be 3 different losses I think?)

If not, then how are these done at once?

There is a lot of confusion on these networks because I am not able to get my head around How are y-labels provided, trained and predicted in YOLOv3 and RetinaNet . Everything will make sense about loss, multioutputs and all if I know this one thing.

asked Jan 31 '21 at 11:14

Deshwal

great question man, in short they do a heck of a complicated things to map boxes to anchors and then to tensors. Moreover each of the different approaches use a different strategy to map anchors, so the answer to your question is not short. – JVGD Feb 06 '21 at 10:06
2

Long would do too ;) – Deshwal Feb 09 '21 at 07:24

How are Ground truth provided to each Pyramid map in RetinaNet or YOLOv3 Paper? How is the mapping of Feature Pyramids done to Ground Truth

0 Answers0