I have spent some time searching Google and wasn't able to find out what kind of optimization algorithm is best for binary classification when images are similar to one another.
I'd like to read some theoretical proofs (if any) to convince myself that particular optimization has better results over the rest.
And, similarly, what kind of optimizer is better for binary classification when images are very different from each other?