1

I am trying to get a better grasp of how object detection works. I (almost) completely understand the concept behind RPNs. However, I am a little bit confused with the selective search algorithm part. This algorithm does not really learn anything, as far as I understand.

So, for example, when I have an image containing people (even though my network does not need to classify these), will the selective search still propose these people to my network?

Of course, my CNN has not learned to classify a human and will output a very low probability for every class (it did learn), and thus, this way, the human will not contain a bounding box.

Also, in further iterations of the R-CNN model, they proposed using regressors to improve the bounding box.

Does this mean that this part of the model got the CNNs feature maps, and, based on this, learned to output a bounding box (this way smaller instances of a detected object would get a smaller bounding box)?

So, in this first iteration, they probably did not need bounding boxes in the training data (since there was no way to learn the size of the bounding boxes and thus no need to find a loss function for this problem)?

Lastly, I understand that the selective search algorithm is an improvement on the sliding window algorithm. It tries to have a high recall, so having false positives is not bad, as long as we have all the true positives. Again, I do not seem to understand HOW this algorithm knows when it has the object it needs without really learning. Any intuïtive explanation or visual (I am a visual learner at first) on how this algorithm works is greatly appreciated.

nbro
  • 39,006
  • 12
  • 98
  • 176
Tibo Geysen
  • 193
  • 5
  • The selective search algorithm is described in the paper [Selective Search for Object Recognition](https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013/UijlingsIJCV2013.pdf), so you can read it and then attempt to provide an answer to your own question below. – nbro Jan 19 '21 at 17:27

0 Answers0