For questions related to the concept of a bounding box in object detection or other computer vision tasks.
Questions tagged [bounding-box]
18 questions
7
votes
2 answers
What's the role of bounding boxes in object detection?
I'm quite new to the field of computer vision and was wondering what are the purposes of having the boundary boxes in object detection.
Obviously, it shows where the detected object is, and using a classifier can only classify one object per image,…

Cody Chung
- 173
- 5
4
votes
4 answers
Can bounding boxes further improve the performance of a CNN classifier?
Suppose I have a standard image classification problem (i.e. CNN is shown a single image and predicts a single classification for it). If I were to use bounding boxes to surround the target image (i.e. convert this into an object detection problem),…

user4779
- 193
- 5
3
votes
1 answer
YOLO - are the anchor boxes used only in training?
another question in YOLO.
I've red about how YOLO adjusts anchor boxes by offsets to create the final bounding boxes.
What I do not understand, is when YOLO does it.
Is it being done only during the training process, or also during the common use of…

Igor
- 181
- 10
2
votes
2 answers
How to architect a network to find bounding boxes in simple images?
I have an application where I want to find the locations of objects on a simple, relatively constant background (fixed camera angle, etc). For investigative purposes, I've created a test dataset that displays many characteristics of the actual…

Daniel
- 191
- 3
2
votes
1 answer
How are IOUs for ground truth boxes in YOLO calculated?
I know how IOU works during detection. However, while preparing targets from ground-truth for training, how is the IOU between a given object and all anchor boxes calculated?
Is the ground truth bounding box aligned with an anchor box such that they…

nivter
- 73
- 3
2
votes
1 answer
How does a bounding box detection network "know" about absolute position?
I've always found bounding box regression a bit weird. There's no positional encoding like in vision transformers, so how does the network "know" the absolute position when producing bounding box coordinates? It gets even weirder when we are dealing…

Alexander Soare
- 1,319
- 2
- 11
- 26
1
vote
0 answers
How do transformers compare to CNNs in terms of compute budget (and computing time) during inference?
Transformers are data and GPU hungry during training. Is this also true at inference time? How do transformers compare to feedforward CNNs e.g., during bounding box generation at inference time? I haven't found a good comparison of computing time…

Mariusmarten
- 383
- 1
- 16
1
vote
1 answer
Is there a state-of-the-art deep learning paper that uses center point regression instead of bounding box regression, for object tracking?
Almost all deep learning based object tracking methods perform bounding box regression. Siamese-based networks which are very popular for object tracking also perform bounding box regression most of the time, although SiamFC type exceptions exist.…

Code Of Duty
- 11
- 1
1
vote
1 answer
How does YOLO detect the object when the object is in multiple grid cells?
I have been reading various articles and watching videos on YouTube, but I can't seem to understand one thing.
How does YOLO make a bounding box for an object if it is in multiple grid cells? For example, in the picture given below, how does it…

Sharjeel M.
- 13
- 3
1
vote
1 answer
Why is it called "area of union" when calculating the Intersection over Union?
When calculating the Intersection Over Union the following explanation is widely used.
(Source: A Survey on Performance Metrics for Object-Detection Algorithms, by Padilla et al. 2020)
The image and name suggest that the denominator (the area of…

Skid
- 13
- 3
1
vote
0 answers
Different equations for Yolov3 in courses/ articles and Darknet GitHub code?
I am confused by the equations for bounding boxes I find online. Some articles say that
box_width = anchor_width * exp(residual_value_of_box_width))
and the coordinates have a sigmoid function.
Eg:…

shikhar97gupta
- 11
- 2
1
vote
1 answer
Why do the object detection networks produce multiple anchor boxes per location?
In various neural network detection pipelines, the detection works as follows:
One processes the input image through the pretrained backbone
Some additional convolutional layers
The detection head, where each pixel on the given feauture map…

spiridon_the_sun_rotator
- 2,454
- 8
- 16
0
votes
0 answers
How to calculate CIoU or DIoU loss only for certain unmasked boxes in tensor and ignore the masked values?
# bbox loss
bbox_labels = batch['bbox'][:, 1:]
bbox_masks = batch['bbox_mask'][:, 1:]
masked_bbox_preds = bbox_preds*bbox_masks
masked_bbox_labels = bbox_labels*bbox_masks
if self.config.bbox_loss == "smoothl1":
box_loss =…

Amish Agrawal
- 1
- 1
0
votes
0 answers
Clustering bounding boxes to reduce image cutout overlap
Given an image and bounding boxes identified within this image, the objective is to group these bounding boxes in such as way that we can define a bigger bounding box of size NxN that will encompass as little bounding boxes as possible while…

JulioHC
- 1
0
votes
0 answers
How to make use of the NuScenes dataset to create distance prediction CNN?
I am trying to make a CNN that can predict distance of objects in a scene by training it on datasets like KITTI/NuScenes.
I understand the basic process of what it would involve but I am unable to find references on what all annotations or labels I…