We're working on a project which needs to read seven segment displays from photos. We've tried a few AI text recognisers but the 'font' is too tricky. So we thought given that we're looking for a small number of quite distinct shapes AutoML would be worth a go.
We've not done this before so we had a go and our first model is a bit disappointing, and we've not found anywhere which gives good advice on training data. We don't have a big collection of photos of seven segment displays we can use (let alone a labelled one)
We made 13000 images of single digits (0-9 as well as C and F). We used a 7 segment font to get the general shape then rendered it on to a flat colour background in random colour combiations, at random locations in the image and with random amounts of skew (using a perspective transform)
this gives pretty random results when tested against photos, it works ok if we generate more images in the same way as we made training data and test against them
- is skewing neccessary/useful? or does AutoML account for skew/rotation etc
- is a flat background a basic no-no? should we be putting random photos in the background? If so where can one get a collection of thousands of random photos or backgrounds to use?
- is colour important - we don't care about colour of the digits, in fact we want any colour combination to be recognised. We've used random foreground/background colour combinations to 'tell' the model this, is this necessary?
- I've seen other training data that is entirely in greyscale - e.g. mikegchambers on youtube and his lego brick recogniser- will people using greyscale training data be converting the image to greyscale before sending it to the model
- where is a good place to get practical advice on making training data