0

I’m constructing a neural net with Keras for object detection for identifying hamburgers. I have a data set with the objects and each image has an array of bounding boxes (there are between 1 and 5 hamburgers in each image, all annotated with bounding boxes).

I understand that when architecting a neural network for outputting a bounding box, the output is 4 regression points, so the network predicts a height, width, and x, y coordinates.

My question is how to handle more than one object of the same class in the image? What should be the output and what should be the input when there are a variable number of objects (and more than one object) in the image?

0 Answers0