2

I'm interested in creating a convolutional neural network or LSTM to locate text in an image. I don't want to OCR the text yet, just find the text regions. Yes, I know Tesseract and other systems can do this, but I want to learn how it works by building my own. All of the tutorials and articles I've seen so far have the CNN output to a classification - "image contains a cat", "image contains a dog". Okay that's nice, but it doesn't say anything about where it was found.

Can anyone point me to some information that describes the output layer of a NN that can give location information? Like, x-y co-ordinates of text boxes?

0 Answers0