For questions related to the family of models known as R-CNN (such as the original R-CNN model, fast R-CNN, faster R-CNN and mask R-CNN).
Questions tagged [r-cnn]
14 questions
6
votes
1 answer
How does the region proposal method work in Fast R-CNN?
I read so many articles and the Fast R-CNN paper, but I'm still confused about how the region proposal method works in Fast R-CNN.
As you can see in the image below, they say they used a proposal method, but it is not specified how it works.
What…

ozoubia
- 61
- 2
2
votes
1 answer
Is intersection of labels acceptable in computer vision?
I have a dataset, where objects are very close to each other. So, the question is: what is the best approach to label them?
There are two possible options:
mark objects so that they will not intersect (it is difficult, surroundings are not included…

Valery Noname
- 121
- 3
2
votes
1 answer
Is it possible to pre-train a CNN in a self-supervised way so that it can later be used to solve an instance segmentation task?
I would like to use self-supervised learning (SSL) to learn features from images (the dataset consists of similar images with small differences), then use the resulting trained model to bootstrap an instance segmentation task.
I am thinking about…

Timco Vanco
- 21
- 3
2
votes
0 answers
How is the data labelled in order to train a region proposal network?
I don't get how the training of the RPN works. From the forward propagation, I have $W \times H \times k$ outputs from the RPN.
How is the training data labeled such that I can use the loss function and update the weights through bach propagation? …

Abd El-Rahman Akram
- 21
- 1
2
votes
0 answers
Inaccurate masks with Mask-RCNN: Stairs effect and sudden stops
I've been using matterport's Mask R-CNN to train on a custom dataset. However, there seem to be some parameters that i failed to correctly define because on practically all of the images, the bottom or top of the object's mask is cut off:
As you…

Nawra C
- 33
- 3
2
votes
1 answer
Why are RNNs used in some computer vision problems?
I am learning computer vision. When I was going through implementations of various computer vision projects, some OCR problems used GRU or LSTM, while some did not. I understand that RNNs are used only in problems where input data is a sequence,…

Naveen Reddy Marthala
- 205
- 2
- 10
1
vote
0 answers
Why are the learned offsets of anchor boxes in the RCNN object detection models scale invariant?
In the original RCNN paper (https://arxiv.org/pdf/1311.2524.pdf) and continued in later RCNN papers such as faster RCNN (https://arxiv.org/pdf/1506.01497.pdf) the learned offsets of the anchor boxes are scale-invariant. For example the learned…

phil
- 143
- 4
1
vote
1 answer
What to do when the ROIs are smaller than $227 \times 227$ in R-CNN?
As English is not my native language, I have some hard time understanding the following sentence:
Regardless of the size or aspect ratio of the candidate region, we warp all pixels in a tight bounding box around it to the required size. Prior to…

Valentin
- 31
- 5
1
vote
0 answers
Does the selective search algorithm in object detection learn?
I am trying to get a better grasp of how object detection works. I (almost) completely understand the concept behind RPNs. However, I am a little bit confused with the selective search algorithm part. This algorithm does not really learn anything,…

Tibo Geysen
- 193
- 5
1
vote
1 answer
In Fast R-CNN, how are input RoIs mapped to the respective RoIs in the feature map before RoI pooling?
I've been reading the Fast R-CNN paper.
My understanding is that the input to one forward pass is the whole input image plus a list of RoIs (generated by selective search or another region proposal method). Then I understand that on the last…

Alexander Soare
- 1,319
- 2
- 11
- 26
1
vote
1 answer
In Faster R-CNN, how can I get the predicted bounding box given the neural network's output?
The RPN loss in Faster RCNN paper is
$$
L({p_i}, {t_i}) = \frac{1}{N_{cls}} \sum_{i} L_{cls}(p_i,p_i^*) + \lambda \frac{1}{N_{reg}} \sum_i p_i^* L_{reg}(t_i, t_i^*)
$$
For regression problems, we have the following parametrization
$$t_x=\frac{x -…

user31844
- 11
- 1
0
votes
0 answers
How to handle multiple object instances in object detection?
I’m constructing a neural net with Keras for object detection for identifying hamburgers. I have a data set with the objects and each image has an array of bounding boxes (there are between 1 and 5 hamburgers in each image, all annotated with…
0
votes
1 answer
How are OCR training datasets constructed?
For the sake of concreteness: let's suppose that the word "OCR" refers to any OCR system build on an R-CNN architecture. Similarly, in aims of simplicity, let's declare that we are interested in reading digits between 0 and 100.
Question: How should…

Ramiro Hum-Sah
- 133
- 5
0
votes
1 answer
Darknet as a part of Yolo v3
I am pretty new to ML and my question may look strange. Especially the last part of it.
1)As far as I understand Darknet53 is an integral part of Yolo just as Resnet50 is a part of R-CNN Am I right?
2)On the other hand I understand that the R-CNN…

Igor
- 181
- 10