Questions tagged [r-cnn]

For questions related to the family of models known as R-CNN (such as the original R-CNN model, fast R-CNN, faster R-CNN and mask R-CNN).

14 questions
6
votes
1 answer

How does the region proposal method work in Fast R-CNN?

I read so many articles and the Fast R-CNN paper, but I'm still confused about how the region proposal method works in Fast R-CNN. As you can see in the image below, they say they used a proposal method, but it is not specified how it works. What…
2
votes
1 answer

Is intersection of labels acceptable in computer vision?

I have a dataset, where objects are very close to each other. So, the question is: what is the best approach to label them? There are two possible options: mark objects so that they will not intersect (it is difficult, surroundings are not included…
2
votes
1 answer

Is it possible to pre-train a CNN in a self-supervised way so that it can later be used to solve an instance segmentation task?

I would like to use self-supervised learning (SSL) to learn features from images (the dataset consists of similar images with small differences), then use the resulting trained model to bootstrap an instance segmentation task. I am thinking about…
2
votes
0 answers

How is the data labelled in order to train a region proposal network?

I don't get how the training of the RPN works. From the forward propagation, I have $W \times H \times k$ outputs from the RPN. How is the training data labeled such that I can use the loss function and update the weights through bach propagation? …
2
votes
0 answers

Inaccurate masks with Mask-RCNN: Stairs effect and sudden stops

I've been using matterport's Mask R-CNN to train on a custom dataset. However, there seem to be some parameters that i failed to correctly define because on practically all of the images, the bottom or top of the object's mask is cut off: As you…
Nawra C
  • 33
  • 3
2
votes
1 answer

Why are RNNs used in some computer vision problems?

I am learning computer vision. When I was going through implementations of various computer vision projects, some OCR problems used GRU or LSTM, while some did not. I understand that RNNs are used only in problems where input data is a sequence,…
1
vote
0 answers

Why are the learned offsets of anchor boxes in the RCNN object detection models scale invariant?

In the original RCNN paper (https://arxiv.org/pdf/1311.2524.pdf) and continued in later RCNN papers such as faster RCNN (https://arxiv.org/pdf/1506.01497.pdf) the learned offsets of the anchor boxes are scale-invariant. For example the learned…
phil
  • 143
  • 4
1
vote
1 answer

What to do when the ROIs are smaller than $227 \times 227$ in R-CNN?

As English is not my native language, I have some hard time understanding the following sentence: Regardless of the size or aspect ratio of the candidate region, we warp all pixels in a tight bounding box around it to the required size. Prior to…
Valentin
  • 31
  • 5
1
vote
0 answers

Does the selective search algorithm in object detection learn?

I am trying to get a better grasp of how object detection works. I (almost) completely understand the concept behind RPNs. However, I am a little bit confused with the selective search algorithm part. This algorithm does not really learn anything,…
1
vote
1 answer

In Fast R-CNN, how are input RoIs mapped to the respective RoIs in the feature map before RoI pooling?

I've been reading the Fast R-CNN paper. My understanding is that the input to one forward pass is the whole input image plus a list of RoIs (generated by selective search or another region proposal method). Then I understand that on the last…
1
vote
1 answer

In Faster R-CNN, how can I get the predicted bounding box given the neural network's output?

The RPN loss in Faster RCNN paper is $$ L({p_i}, {t_i}) = \frac{1}{N_{cls}} \sum_{i} L_{cls}(p_i,p_i^*) + \lambda \frac{1}{N_{reg}} \sum_i p_i^* L_{reg}(t_i, t_i^*) $$ For regression problems, we have the following parametrization $$t_x=\frac{x -…
0
votes
0 answers

How to handle multiple object instances in object detection?

I’m constructing a neural net with Keras for object detection for identifying hamburgers. I have a data set with the objects and each image has an array of bounding boxes (there are between 1 and 5 hamburgers in each image, all annotated with…
0
votes
1 answer

How are OCR training datasets constructed?

For the sake of concreteness: let's suppose that the word "OCR" refers to any OCR system build on an R-CNN architecture. Similarly, in aims of simplicity, let's declare that we are interested in reading digits between 0 and 100. Question: How should…
0
votes
1 answer

Darknet as a part of Yolo v3

I am pretty new to ML and my question may look strange. Especially the last part of it. 1)As far as I understand Darknet53 is an integral part of Yolo just as Resnet50 is a part of R-CNN Am I right? 2)On the other hand I understand that the R-CNN…
Igor
  • 181
  • 10