0

I am reading this paper, that is discussing the use of distance metrics for character recognition predicton.

I can see the advantages of using a distance metrics in predictions like character recognition: You can model a set of known character's pixel vectors, and then measure a new unseen vector against this model, get the distance, and if the distance is low, predict that this unseen vector belongs to a particular class.

I'm wondering if there's any disadvantages to using distance metrics as the cost function in character recognition? For example, I was thinking maybe the distance calculation is slow for large images (would have to calculate the distance between each item in two long image vectors)?

Slowat_Kela
  • 287
  • 2
  • 9

1 Answers1

1

As I see it, the question boils down to the comparison between distance (function/metric) based Optical Character Recognition (OCR) and (for example) OCR done by means of Convolutional Neural Networks (CNNs). Particularly, it focuses on the cons of the former option.

There are a few potential problems associated with using distance based OCR systems. First of all, this approach requires an appropriate distance metric to deliver good results. Different types of distance metrics/functions are sensitive to different features in the input images. For example, some functions might penalize absolute differences, while others penalize squared differences, where the latter punishes differences of magnitude $> 1$ stronger than the former, while the former penalizes differences of (absolute) magnitude $< 1$ stronger than the latter. Which type of distance metric works best for a given problem has commonly to be determined empirically.

Also, distance based OCR systems may possibly be more sensitive to, for example, different levels in illumination than CNNs. While both sorts of classifiers could profit from data augmentation (i.e. adding variations of the existing training data to the dataset), OCR based on CNNs has the benefit that CNN training procedures produce classifiers that commonly generalize well to slight variations in the incoming data, while some slight novel tilt (or again variation in illumination) may break the distance based classification procedure, but may have not too detrimental effects on CNN based classifiers due to their rather strong generalization capabilities.

Of course, one can also try to increase the robustness of distance based OCR systems, but this is commonly associated with developing exhaustive preprocessing pipelines to standardize the appearance of incoming character images. Thus, in terms of system design, it is often easier to set up a CNN-based neural network architecture and train it (possibly drawing upon regularization to boost generalization of the trained system even further) than trying to design complicated preprocessing strategies to shift, rotate, and normalize character images exhaustively to boost performance of a distance based OCR system.

To sum up, after all, there are a lot of potential issues related to using plain distance (metric/function) based OCR systems, most of which, however, can strongly be alleviated when drawing upon clever (pre-)processing pipelines to standardize input images before performing distance based classification.

From all those issues mentioned above, the strongest disadvantage of this technique, however, (which cannot be alleviated so easily) is that it doesn't generalize too easily to novel input data, where CNNs might perform better with appropriate training data (+ augmentation and regularization in general).

You also mentioned the aspect of computational cost associated with distance based classification. First of all, I am not an expert in computational complexity/cost related questions. However, most image related computations can efficiently be executed on graphics cards. So, subtracting images pixel-wise, for example,is not very costly given it can be executed on a graphics card. But of course, when using distance based OCR, the computational cost associated with computing distances and comparing distance values scales with the reference dataset size. Using CNNs, the time needed to perform a classification is always the same irrespective of the training dataset size.

Daniel B.
  • 805
  • 1
  • 4
  • 13