Lets take an ad recommendation problem for 1 slot. Feedback is click/no click. I can solve this by contextual bandits. But I can also introduce exploration in supervised learning, I learn my model from collected data every k hours.
What can contextual bandits give me in this example which supervised learning + exploration cannot do?