Formal definition of the Object Detection problem

Question

For many problems in computer science, there is a formal, mathematical problem defition.
Something like: Given ..., the problem is to ...

How can the Object Detection problem (i.e. detecting objects on an image) be formally defined?

Given a set of pixels, the task is to decide

which pixels belong to an object at all,
which pixels belong to the same object.

How can this be put into a formula?

For the first question, a typical way to formalize it might be defining a *characteristic function*: $$\chi_o (p) = \begin{cases} 1, & \text{if pixel $p$ belongs to object $o$,}\\ 0, & \text{else } \end{cases}$$ — JavAlex, Sep 25 '20 at 15:25
That formulation would be ok for _image (or object) segmentation_ (where you need to classify individual pixels), but, in an _object detection_ problem (which is a different problem than image segmentation), you do **not** need to classify individual pixels. You only need to find if the image contains an object of class $c$, where is that object (i.e. locate it), and maybe draw a bounding box around it. Maybe you are interested in _image (or instance) segmentation_. If that's the case, please, edit your post to say that. — nbro, Sep 29 '20 at 15:54
Thank you, @nbro. I am interested in *object detection*, not *image segmentation*. I just thought that using pixel-based information could be a workaround to find a formula for object detection. Actually, image segmentation could be the first step of the object detection task: 1. classify single pixels and assign probabilities to them of belonging to a specific object, 2. merge pixels to objects. — JavAlex, Sep 30 '20 at 12:27
The problem with that approach is that you need labelling information for all pixels, which may be expensive to acquire. That's why pixel-level classification to perform object detection may be overkill, but, of course, it's possible. In fact, image/instance segmentation can be thought of as a form of object detection (but a fine-grained one, let's say). — nbro, Sep 30 '20 at 12:30

score 5 · Accepted Answer · answered Sep 29 '20 at 02:34

5

This is just an idea

Given a set of pixels, the task is to decide:

Which pixel is the center of an object?
What is the size of the bounding boxes with the center is the pixel in part 1?

Formula, consider this is a 2D image, call $(x,y)$ is the horizontal and vertical coordinate and $(w_i,h_i)$ is the size of bouding box of object $i$:

$\text{For }m \in[x,x+w_i] \text{ and } n\in[y,y+h_i]$

$c_i(m,n) = \begin{cases} 1, \text{if pixel at position (m,n) is belongs to object i,}\\ 0, \text{else} \end{cases}$

answered Sep 29 '20 at 02:34

CuCaRot

892
3
15

It's important to note that classifying pixels is known as _image_ (or _instance_) _segmentation_, which is a different task than _object detection_ (finding objects in images, maybe by providing the bounding box around the found objects). So, you do **not** need to classify individual pixels in the _object detection_ task. You just need to find where the objects are (e.g. the center of the bounding box around them). – nbro Sep 29 '20 at 15:49
I knew what you mean, but the question of @JavAlex asked for the formula, so I thought about that equation based on his comment. – CuCaRot Sep 30 '20 at 04:03
1

Thank you very much @Toby. That sounds like a good first idea – JavAlex Sep 30 '20 at 12:28

Formal definition of the Object Detection problem

1 Answers1