0

I've been looking at various bounding box algorithms, like the three versions of RCNN, SSD and YOLO, and I have noticed that not even the original papers include pseudocode for their algorithms. I have built a CNN classifier and I am attempting to incorporate bounding box regression, though I am having difficulties in implementation. I was wondering if anyone can whip up some pseudocode for any bounding box classifier or a link to one (unsuccessful in my search) to aid my endeavor.

Note: I do know that there are many pre-built and pre-trained versions of these object classifiers that I can download from various sources, I am interested in building it myself.

DukeZhou
  • 6,237
  • 5
  • 25
  • 53
MasterYoda
  • 109
  • 1
  • 5

2 Answers2

2

The minimal algorithm for convolution in $\mathbb{R}^2$ is a four dimensional iteration.

for all vertical kernel positions
  for all horizontal kernel positions
    initialize the value at the output position to the bias
    for all vertical positions in the kernel
      for all horizontal positions in the kernel
        add the product of the input value to that of the output position

In $\mathbb{R}^n$ it is a $2n$ dimensional iteration following this pattern.

The minimal algorithm for regression of bounding boxes orthogonal with respect to the image grid (no tilting) is this.

until number of boxes reaches max
  make first guess of two coordinates
  until number of guesses reaches max or matching criteria is met
    evaluate guess
    remember guess and guess results
    improve on guess based on evaluation results and
          possibly injected randomness,
          excluding locations already covered
    if some intermediate criteria is met
      change the nature of the guessing, evaluation, and improving
            as is appropriate for the criteria match
            (this covers approaches that have multiple phases)
  if no guess matched criteria
    break

That's approaching concepts from the top down. When approaching from the other direction, reverse engineer the best code. In the case of RCNN, it is unadvisable to find implementations following the first paper expressing the approach. Reading the first paper may be helpful to get the gist of the approach, but reverse engineer the best one, which, in this case, may be Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, 2016. Study the implementation they pushed to git at https://github.com/rbgirshick/py-faster-rcnn/tree/master/. The algorithm is in lib/fast_rcnn.

The reason this algorithm isn't spelled out in their paper or any paper from the first on down through the lineage to their paper is simple.

  • The pseudo-code above is universal across all convolutions and all bounding box regressions, so that doesn't need to be restated with each approach.
  • The main features of an approach like RCNN, SSD, or YOLO are not algorithmic. They are algebraic expressions of the guess, the evaluation, the improvement upon the guess, and the test for the criteria.
  • The use of objects and functional programming makes the implementation more readable, so it can be easier to read the implementation than read a huge chunk of the above pseudo-code with all the algebra and test branches plugged in.
  • For the above reasons, it is rare that pseudo-code would be used prior to the implementation when the paper is written.
  • The return on investment of reverse engineering from code to pseudo-code is only sufficient motivation if one is going to improve the algorithm and write another paper, and on the way to finishing the prior paper's pseudo-code, the new paper and the new code gets finished first.

Since the author of this question seems interested in writing their own code, it may be reasonable to assume the same author may be interested in thinking their own thoughts, so I'll add this.

None of these algorithms are object recognition. Recognition has to do with cognition, and these approaches do not even touch upon cognitive processing, another branch of AI not related to convolution and probably not closely related to formal regression either. Additionally, bounding boxes are not the way animal vision systems work. Early gestalt experiments in vision indicate a complete independence of human vision from rectilinear formalities. In lay terms, humans and other organisms with vision systems don't have any conception of Cartesian coordinates. We can still read books if tilted slightly relative to the plane passing through our eyes. We don't zoom or tilt in Cartesian coordinates.

These facts may not be necessary to comprehend to create an automated vehicle driving system that produces a better safety record than average human drivers, but that is only because humans don't set that bar very high and because cars roll in the plane of the road. These facts are indeed necessary in aeronautic system used in military applications, where nothing is particularly Cartesian and the meaning of horizontal and vertical is ambiguous. For that reason, it is unlikely that bounding boxes will be the edge of vision technology for very long.

If one wishes to transcend current mediocrity, consider bounding circles with fuzzy boundaries, which would be more like the systems that evolved over millions of biological iterations. If the computer hardware is poorly fit to radial processing, design new hardware in which radial processing is native and in which Cartesian coordinates may be foreign and cumbersome.

Regarding the classifier, the classifier papers do generally include the algorithm, so those can be found by doing an academic search for the original paper describing the classifier being used.

Douglas Daseeco
  • 7,423
  • 1
  • 26
  • 62
0

In General

Each of those projects has open source code on github that you can look at. If you do some quick googling you'll find the software for these basic regressors exist for different deep learning frameworks. Usually the these types of projects only include pseduo code for the custom or complicated layers that are involved in the detector. There isn't much of a need to include pseudo code for a simple convolution because it's a well established operation. I just included a few links here, but if you look around there's a bunch of implementations in different frameworks that you'll find.

SSD

https://github.com/amdegroot/ssd.pytorch https://github.com/weiliu89/caffe/tree/ssd

FasterRCNN (Better version of RCNN)

https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN

YOLO V3

V3 -- https://itnext.io/implementing-yolo-v3-in-tensorflow-tf-slim-c3c55ff59dbe V1 -- https://github.com/hizhangp/yolo_tensorflow

Aside

As an aside, unless you're trying to do novel research or fit a specific function usually these detectors work pretty well out of the box. Running a base level detector and them modifying for your application is usually a safer route unless you have a deep understanding of how these networks work.

juicedatom
  • 527
  • 3
  • 10