I'm new to object detectors and segmentation. I want to localize digits on a plate as fast as possible. All images of the dataset are normalized to $300 \times 60$. There are different approaches to solve the problem. For example, binarization + connected component labeling, vertical and horizontal projection. The aforementioned approaches fail in ambient lights, noises, and shadows. Also, there are other approaches such as STN-OCR (based on convolutional recurrent neural networks) that need a lot of plates with different composition of numbers. I have limited plates with the same numbers (about 1000 different numbers) but totally 10000 plates in different illuminations and noises. I have a good OCR (without segmentation), so I need a network just localize digits.
Is there any deep learning-based architecture for this purpose? Can I use faster RCNN? Yolo? SSD?
I trained Faster RCNN in Matlab, but it detects too many random bounding boxes for each plate. What could be the problem?