I'm looking for some suggestions on how to improve our vehicle image recognition. We have an online marketplace where customers submit photos of their vehicles. The photos need to meet certain requirements before the advert can be approved.
Customers are required to submit the following vehicle photos: front, back, left-side, right-side, engine (similar to the front photo but with the hood open) and instrument panel cluster. The vehicle must be well framed in the photo, in other words, it must not be too small or so big that the edges touch the frame of the photograph. It also needs to be one of the mentioned types and the camera must be facing the vehicle directly with only small angle variations (a front photo can't include a large piece of the side of a car).
Another developer had a go and built a CNN with Keras which does alleviate some manual grind (about 20,000 photos were used for training - no annotations). The accuracy sits at around 75% for the vehicle photos but only 55% for the engine and instrument cluster. Each photo is still manually checked, but it is a case of agreeing or disagreeing with what was recognised.
I was wondering if it wouldn't be better to detect a vehicle in the image using an existing pre-trained model like ImageAI. Use the bounding box of the vehicle to determine it is correctly placed in the frame of the photograph and within acceptable dimensions. There may be multiple vehicles in the picture so work with the most prominent one.
At that point would it be worth trying to develop something to workout the pose of the vehicle (idea: https://github.com/johnberroa/CORY) or just do some transfer learning with whatever pre-existing trained model was used and spend some time annotating the images?