2

I'm interested in using ResNet-50 to classify images of objects for around 1000 unique classes. I'm wondering if there is any way to estimate how many unique angles I need in my training set to classify images that can be taken from any angle. For example, if for a given object I had 500 training images from directly the front and 500 training images from directly the top, I'd have 2 unique angles.

A model trained with only those 2 unique angles probably wouldn't be able to classify the same object if it was given a photo from the top right looking down.

Is there anyway to figure out how many unique angles I would need in my training set to classify images that could be taken from any angle? If I had 12 unique angles (top, bottom, front, back, left, right, front-left, front-right, front-top, front-bottom, back-left, back-right, back-top, back-bottom) would I then be able to classify images of any arbitrary angle?

To clarify, if I had 12 unique angles, that would mean I would have many photos from each of the 12 angles, but the 12 angles would all be exactly the same with no variation. I.E. top would be exactly a 90-degree angle towards the object on the Z-axis and 0-degree angles on the X and Y axis, for many photos.

nbro
  • 39,006
  • 12
  • 98
  • 176

2 Answers2

1

[I wanted it to be a comment but it's too long :)]

I don't think it's a good approach to split point of views into a group of 12 angles. The main purpose of using neural net is to have model that is able to generalize the data. That means the perfect model will be able to recognize an object in every orientation. Your task is to create a model that will be able to do similar thing. In my opinion You should try to make your dataset as differential as possible with not only different point views but also different lightning, background etc. According to the ResNet paper, they evalueted the model on ImageNet 2012 dataset. It was 1.28mil images for 1000 classes. That means approximately 1280 images per class. I guess that's good starting point for You. During training You'll be able to see if that is enough, or if You need to get more data or use some data augmentation techniques.

MASTER OF CODE
  • 232
  • 2
  • 9
  • In the ideal world I would want as many angles as possible. That's not what I'm asking about though. I'm asking about a situation where there is limitations in the training data. – Tyler Hilbert Nov 17 '21 at 16:15
  • Again, You can use about 1280 images per class, so ~107 images per angle and try to train it on that. If You have no such portion of the images, train on data that You've got and if training will be relatively hard try data augmentation. – MASTER OF CODE Nov 17 '21 at 18:09
0

The viewing angle is only one of many variables that influence whether a model can classify an object correctly. Other factors might be: is the object occluded? how close are we to the object? is the image noisy? what size has the image of the object? how are the lighting conditions? what color is the surrounding? But let's assume we keep all those variables perfectly constant.

Then Neural radiance fields (NeRFs) might give you an idea of how much information you need to extrapolate the pieces where you have less information. Those NeRF representations are 'richer' than what you need for classification tasks, the interpolation to 'unseen' angles however is conceptually similar.

The more images from different angles you have, the better your classification accuracy will become. But there probably is a different 'amount of angles'-to-accuracy mapping for every conceivable object, but similar objects will require a similar number of images for the same classification accuracy.

Mariusmarten
  • 383
  • 1
  • 16