2

I have thousands of images similar to this.

truck truck

I can classify them using existing metadata to different folders according to gravel product type loaded on the truck.

What would be optimal way to train a model for image classification that would be able to guess the type of stone product on truck from the picture? I can use ML.NET builder that suits me as part of Visual Studio and .NET but perhaps something pre-trained would be better?

1 Answers1

3

Pre-training is usually beneficial if the dataset the model was trained on is to some degree related to your target dataset (i.e. the dataset you wish to fine-tune on.) And is also useful to overcome having little data.

For example, using ImageNet to pretrain the model can help if the target dataset is about natural images too.

Maybe you can try to pre-train on some general dataset like ImageNet (or use a pretrained model), then when fine-tuning you also want to learn one or more last hidden layers, so not only the output layer: doing so ensure more adaptation to your target data.

Luca Anzalone
  • 2,120
  • 2
  • 13
  • 1
    I don't know if this helps, thus the comment rather than an answer, but one thing I learned very early on in computer vision is most robots and AI usually see the world very different to humans, as shapes, so you can say every single object on the planet is made up of a square, circle or triangle, the combination and order of those shapes denotes the probability of what an object can be. Note I have never done my own model like this but I find this a useful way to think about it if I did and something to consider on pre trained models for easier deduction. – jamiecropley Apr 22 '23 at 16:59
  • As a first layer I meant rather some logic, that finds what is important on the picture, in my case the truckload. So I would need something to indentify a truck load first and add the specific training to distingush between content of the truck load. – Vojtěch Dohnal Apr 24 '23 at 06:23
  • 2
    Well, in that case you first need an *object detector* model then an *image classifier*: the detector will find the bounding-box of the truck (if any), you then crop, and next use the classifier to get which category its content belongs too. – Luca Anzalone Apr 24 '23 at 08:31
  • Ok, thanks, but do I really need it? Where to get it? How difficult it is to train? I am looking for more spcific advice how to accomplish the task or what would be ideal starting point in trying this without spending too much time on the task. – Vojtěch Dohnal Apr 24 '23 at 09:13
  • 1
    The easy approach is to just use an image classifier. Say the content can be categorized in $K$ classes, you add an extra *void class* to handle images that don't show either a truck or cargo. You use a pre-trained model (e.g. on ImageNet - that has also images of trucks) and train the output layer with now $K+1$ classes. This should be reasonably easy but may fail more often (e.g. more false positives) than the detector+classifier approach. But it's a good start also to see if pre-training helps at all. – Luca Anzalone Apr 24 '23 at 10:02
  • 1
    I think both approaches make sense: using an image classifier directly on the whole image, or using an object detector combined with an image classifier on the cropped truckload. If you have the time to implement both and compare their results, it's very much worth it. – Stef Apr 25 '23 at 13:03
  • https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/object-detection-model-builder - Detector model - the worst part seems to be to draw those rectangles manually. Then I would have to run the detecor on images and extract rectangles from them and then train the classification model. But would it be useful, I will have now 2 models to debug - if I evaluate one image I will have to run the first and then the second model. – Vojtěch Dohnal Apr 25 '23 at 13:15