Questions tagged [computer-vision]

For questions related to computer vision, which is an interdisciplinary scientific field (which can e.g. use image processing techniques) that deals with how computers can be made to gain high-level understanding from digital images or videos. For example, image recognition (that is, the identification of the type of objects in an image) is a computer vision problem.

For more info, see e.g. https://en.wikipedia.org/wiki/Computer_vision.

511 questions
85
votes
9 answers

How is it possible that deep neural networks are so easily fooled?

The following page/study demonstrates that the deep neural networks are easily fooled by giving high confidence predictions for unrecognisable images, e.g. How this is possible? Can you please explain ideally in plain English?
17
votes
1 answer

Are information processing rules from Gestalt psychology still used in computer vision today?

Decades ago there were and are books in machine vision, which by implementing various information processing rules from gestalt psychology, got impressive results with little code or special hardware in image identification and visual…
17
votes
1 answer

What is a fully convolution network?

I was surveying some literature related to Fully Convolutional Networks and came across the following phrase, A fully convolutional network is achieved by replacing the parameter-rich fully connected layers in standard CNN architectures by…
13
votes
3 answers

Is it possible to train a neural network to estimate a vehicle's length?

I have a large dataset (over 100k samples) of vehicles with the ground truth of their lengths. Is it possible to train a deep network to measure/estimate vehicle length? I haven't seen any papers related to estimating object size using a deep neural…
11
votes
3 answers

Is it difficult to learn the rotated bounding box for a (rotated) object?

I have checked out many methods and papers, like YOLO, SSD, etc., with good results in detecting a rectangular box around an object, However, I could not find any paper that shows a method that learns a rotated bounding box. Is it difficult to learn…
11
votes
1 answer

In Computer Vision, what is the difference between a transformer and attention?

Having been studying computer vision for a while, I still cannot understand what the difference between a transformer and attention is?
novice
  • 113
  • 1
  • 4
11
votes
2 answers

Do deep learning algorithms represent ensemble-based methods?

According to the Wikipedia article on deep learning: Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of…
11
votes
0 answers

Extending FaceNet’s triplet loss to object recognition

FaceNet uses a novel loss metric (triplet loss) to train a model to output embeddings (128-D from the paper), such that any two faces of the same identity will have a small Euclidean distance, and such that any two faces of different identities will…
9
votes
1 answer

Why does nobody use decision trees for visual question answering?

I'm starting a project that will involve computer vision, visual question answering, and explainability. I am currently choosing what type of algorithm to use for my classifier - a neural network or a decision tree. It would seem to me that, because…
9
votes
1 answer

In YOLO, what exactly do the values associated with each anchor box represent?

I'm going through Andrew NG's course, which talks about YOLO, but he doesn't go into the implementation details of anchor boxes. After having looked through the code, each anchor box is represented by two values, but what exactly are these values…
9
votes
1 answer

What are sim2sim, sim2real and real2real?

Recently, I always hear about the terms sim2sim, sim2real and real2real. Will anyone explain the meaning/motivation of these terms (in DL/RL research community)? What are the challenges in this research area? Anything intuitive would be appreciated!
8
votes
3 answers

What are the state-of-the-art approaches for detecting the most important "visual attention" area of an image?

I'm trying to detect the visual attention area in a given image and crop the image into that area. For instance, given an image of any size and a rectangle of say $L \times W$ dimension as an input, I would like to crop the image to the most…
8
votes
3 answers

Is it okay to use publicly available Instagram videos to train an AI?

Since I haven't found any good training data for my university project, I want to use pictures and videos from public Instagram profiles. Am I allowed to do that?
8
votes
2 answers

What are the main algorithms used in computer vision?

Nowadays, CV has really achieved great performance in many different areas. However, it is not clear what a CV algorithm is. What are some examples of CV algorithms that are commonly used nowadays and have achieved state-of-the-art performance?
7
votes
2 answers

Term for algorithms that are not trained

Before the advent of neural architectures, many AI domains (e.g. speech recognition and computer vision) used algorithms that consisted of a series of hand-crafted transformations for feature extraction. In speech recognition everything to do with…
1
2 3
34 35