Why are optimization algorithms for deep learning so simple?

Question

From my knowledge, the most used optimizer in practice is Adam, which in essence is just mini-batch gradient descent with momentum to combat getting stuck in saddle points and with some damping to avoid wiggling back and forth if the conditioning of the search space is bad at any point.

Not to say that this is actually easy in absolute terms, but after a few days, I think I got most of it. But when I look into the field of mathematical (non-linear) optimization, I'm totally overwhelmed.

What are the possible reasons that optimization algorithms for neural networks aren't more intricate?

There are just more important things to improve?
Just not possible?
Is Adam and others already so good that researchers just don't care?

Can you give some examples from mathematical optimisation that compare well with problems ADAM tries to address? — hH1sG0n3, Oct 22 '21 at 10:38
I think that 1 or more of your questions are already _partially_ answered [here](https://ai.stackexchange.com/q/25102/2444). I am not an expert on this topic, so I can't say more about it. By the way, [if you think that post and answer are useful, you should upvote them!](https://ai.stackexchange.com/help/privileges/vote-up). Finally, I am not going to close (for now) this post as a duplicate of that one because you're not asking the same question, although they are very related. It seems that your question is slightly more general. — nbro, Oct 22 '21 at 13:09

Why are optimization algorithms for deep learning so simple?

0 Answers0