Attempting to solve a optical character recognition task using a feed-forward network

Question

I am doing some experimentation on neural networks, and for that I am trying to program a plain OCR task. I have learned CNNs are the best choice ,but for the time being and due to my inexperience, I wanna go step by step and start with feedforward nets.

So my training data is a set of roughly 400 16*16 images extracted from a script that draws every alphabet char in a tiny image for a small set of fonts registered in my computer.

Then the test data set is extracted from the same procedure, but for all fonts in my computer.

Well, results are quite bad. Get an accuracy of aprox. 45-50%, which is very poor... but that's not my question.

The point is that I can't get the MSE below 0.0049, no matter what hidden layer distribution I apply to the net. I have tried with several architectures and all comes down to this figure. Does that mean the net cannot learn any further given the data?

This MSE value however throws this poor results too.

I am using Tensorflow API directly, no keras or estimators and for a list of 62 recognizable characters these are examples of the architectures I have used: [256,1860,62] [256,130,62] [256,256, 128 ,62] [256,3600,62] ....

But never get the MSE below 0.0049, and still results are not over 50%.

Any hints are greatly appreciated.

actually it's not only about architecture, how about your learning rate value? have you try to tune it? how about the number of your training epoch? — malioboro, Mar 04 '19 at 22:23
400 training examples is quite a small number, I think, in order to solve this task. You will need thousands of pictures in order to achieve good results, I guess. — nbro, Mar 05 '19 at 10:04
No 62. The output of the net is a one-hot array, each position corresponding to a different character to recognize — Chal.lo, Mar 06 '19 at 07:44
I would recommend using a larger dataset. The MNIST or EMNIST datasets are great for training OCRs. Also, how many generations are you training your network for. — Sean Mabli, Mar 24 '21 at 17:18

score 0 · Answer 1 · answered Apr 23 '21 at 19:15

So the training data is a small number of examples (400) drawn from a small set of fonts, and the test data is a much larger dataset drawn from a much larger set of fonts and is therefore much more variable than the training data. Two issues here are the small training data size and the difference in distributions between the training and test data. I would try the following:

Instead of defining your own architecture, try some of the pretrained architectures available in Keras such as ResNet50 or VGG16. You can start them either with random weights or imagenet weights. Remove the top layer and put your own layers on. You can also selectively unfreeze layers and see if that makes any difference.
To deal with the issues I mention above, use data augmentation to introduce variability into the training set.

You also have not mentioned in the post how you decide to stop training the network. In case the network is overfitting, you could try finishing training earlier.

Attempting to solve a optical character recognition task using a feed-forward network

1 Answers1