6

For many problems in computer science, there is a formal, mathematical problem defition.
Something like: Given ..., the problem is to ...

How can the Object Detection problem (i.e. detecting objects on an image) be formally defined?

Given a set of pixels, the task is to decide

  1. which pixels belong to an object at all,
  2. which pixels belong to the same object.

How can this be put into a formula?

JavAlex
  • 75
  • 5
  • 3
    For the first question, a typical way to formalize it might be defining a *characteristic function*: $$\chi_o (p) = \begin{cases} 1, & \text{if pixel $p$ belongs to object $o$,}\\ 0, & \text{else } \end{cases}$$ – JavAlex Sep 25 '20 at 15:25
  • That formulation would be ok for _image (or object) segmentation_ (where you need to classify individual pixels), but, in an _object detection_ problem (which is a different problem than image segmentation), you do **not** need to classify individual pixels. You only need to find if the image contains an object of class $c$, where is that object (i.e. locate it), and maybe draw a bounding box around it. Maybe you are interested in _image (or instance) segmentation_. If that's the case, please, edit your post to say that. – nbro Sep 29 '20 at 15:54
  • Thank you, @nbro. I am interested in *object detection*, not *image segmentation*. I just thought that using pixel-based information could be a workaround to find a formula for object detection. Actually, image segmentation could be the first step of the object detection task: 1. classify single pixels and assign probabilities to them of belonging to a specific object, 2. merge pixels to objects. – JavAlex Sep 30 '20 at 12:27
  • The problem with that approach is that you need labelling information for all pixels, which may be expensive to acquire. That's why pixel-level classification to perform object detection may be overkill, but, of course, it's possible. In fact, image/instance segmentation can be thought of as a form of object detection (but a fine-grained one, let's say). – nbro Sep 30 '20 at 12:30

1 Answers1

5

This is just an idea

Given a set of pixels, the task is to decide:

  1. Which pixel is the center of an object?
  2. What is the size of the bounding boxes with the center is the pixel in part 1?

Formula, consider this is a 2D image, call $(x,y)$ is the horizontal and vertical coordinate and $(w_i,h_i)$ is the size of bouding box of object $i$:

$\text{For }m \in[x,x+w_i] \text{ and } n\in[y,y+h_i]$

$c_i(m,n) = \begin{cases} 1, \text{if pixel at position (m,n) is belongs to object i,}\\ 0, \text{else} \end{cases}$

CuCaRot
  • 892
  • 3
  • 15
  • It's important to note that classifying pixels is known as _image_ (or _instance_) _segmentation_, which is a different task than _object detection_ (finding objects in images, maybe by providing the bounding box around the found objects). So, you do **not** need to classify individual pixels in the _object detection_ task. You just need to find where the objects are (e.g. the center of the bounding box around them). – nbro Sep 29 '20 at 15:49
  • I knew what you mean, but the question of @JavAlex asked for the formula, so I thought about that equation based on his comment. – CuCaRot Sep 30 '20 at 04:03
  • 1
    Thank you very much @Toby. That sounds like a good first idea – JavAlex Sep 30 '20 at 12:28