Questions tagged [optical-character-recognition]

For questions about the application of AI/ML algorithms in the field of optical character recognition (OCR), aka optical character reader (OCR), which is the mechanical or electronic conversion of images of typed, handwritten, or printed text into machine-encoded text.

For more info, take a look at the related Wikipedia article.

34 questions
24
votes
3 answers

Why can't OCR be perceived as a good example of AI?

On the Wikipedia page about AI, we can read: Optical character recognition is no longer perceived as an exemplar of "artificial intelligence" having become a routine technology. On the other hand, the MNIST database of handwritten digits is…
kenorb
  • 10,423
  • 3
  • 43
  • 91
10
votes
3 answers

Are there any textual CAPTCHA challenges which can fool AI, but not human?

Are there any modern techniques of generating textual CAPTCHA (so person needs to type the right text) challenges which can easily fool AI with some visual obfuscation methods, but at the same time human can solve them without any struggle? For…
kenorb
  • 10,423
  • 3
  • 43
  • 91
7
votes
2 answers

Effective algorithms for OCR

I am using Google's OCR to extract text from images, like receipts and invoices. Whare examples of techniques used to make sense of the text? For example, I would like to extract the date, name of the business, address, total amount, etc. Before…
6
votes
1 answer

How should the racing agent take into account the velocity of the vehicle, given the images with a speedometer?

I'm developing a game AI, which tries to master racing simulations. I already trained a CNN (AlexNet) on in-game footage of me playing the game and the pressed keys as the target. As the CNN is only making predictions on a frame-to-frame basis, and…
5
votes
1 answer

Why object detection algorithms are poor in optical character recognition?

OCR is still a very hard problem. We don't have universal powerful solutions. We use the CTC loss function An Intuitive Explanation of Connectionist Temporal Classification | Towards Data Science Sequence Modeling With CTC | Distill which is very…
5
votes
2 answers

How can we recognise musical notes in low-resolution or blurry images?

I was looking for an approach to recognise musical notes from photos. I found this repository https://github.com/mpralat/notesRecognizer. However, it doesn't seem good enough. If you look into the bad folder, you can see that just tiny variations of…
Toskan
  • 151
  • 1
  • 4
4
votes
1 answer

How should I define the loss function for a multi-object detection problem?

I'm trying to create a text recognition project using CNN. I need help regarding the text detection task. I have the training images and bounding box details for them. But I'm unable to figure out how to create the loss function. Can anyone help…
3
votes
2 answers

How could I use machine learning to detect text and non-text regions in scanned documents?

I have a collection of scanned documents (which come from newspapers, books, and magazines) with complex alignments for the text, i.e. the text could be at any angle w.r.t. the page. I can do a lot of processing for different features extraction.…
3
votes
1 answer

In OCR, how should I deal with the warped text on the sides of oval objects?

Consider an image that contains one can (or bottle, or any similar oval object), which has texts all over it. In the image below, I have many bottles, but you can assume that each image only contains one such object. As we can see, in each can, the…
3
votes
0 answers

Is there a deep learning-based architecture for digit localisation?

I'm new to object detectors and segmentation. I want to localize digits on a plate as fast as possible. All images of the dataset are normalized to $300 \times 60$. There are different approaches to solve the problem. For example, binarization +…
3
votes
1 answer

Attempting to solve a optical character recognition task using a feed-forward network

I am doing some experimentation on neural networks, and for that I am trying to program a plain OCR task. I have learned CNNs are the best choice ,but for the time being and due to my inexperience, I wanna go step by step and start with feedforward…
2
votes
0 answers

zonal or template ocr invoices reading

I'd like to explore the possibilities of applying artificial intelligence to ocr reading. Basic ocr invoices processing let me convert 30% of them only. The main purpose is defining invoices areas by training an ai, then process those areas with…
2
votes
1 answer

Which AI techniques are there that combine multiple models to make sense of data at different stages?

I have been working to design a system that uses multiple machine learning models to make sense of data that is dynamically webscraped. Each AI would handle a specific task, for example: An AI model would identify text in an image, then attempt to…
2
votes
0 answers

OCR - Text recognition from Image

I plan to develop OCR application using tensorflow to get the value from the image. Text in the image may handwritting or text printed. From the image, my ocr appplication will able to get the value of below: 1. ChequeDate 2. Payee Name 3. Legal…
2
votes
0 answers

How does a neural network output text box location data?

I'm interested in creating a convolutional neural network or LSTM to locate text in an image. I don't want to OCR the text yet, just find the text regions. Yes, I know Tesseract and other systems can do this, but I want to learn how it works by…
1
2 3