2

Let's say I want to teach a neural to classify images, and, for some reason, I insist on using reinforcement learning rather than supervised learning.

I have a dataset of images and their matching classes. Then, for each image, I could define a reward function which is $1$ for classifying it right and $-1$ for classifying it wrong (or perhaps even define a more complicated reward function where some mistakes are less costly than others). For each image $x^i$, I can loop through each class $c$ and use a vanilla REINFORCE step: $\theta = \theta + \alpha \nabla_{\theta}log \pi_{\theta}(c|x^i)r$.

Would that be different than using standard supervised learning methods (for example, the cross-entropy loss)? Should I expect different results?

This method actually seems better since I could define a custom reward for each misclassification, but I've never seen anyone use something like that

nbro
  • 39,006
  • 12
  • 98
  • 176
Gilad Deutsch
  • 629
  • 5
  • 12
  • 1
    Is your question a duplicate of https://ai.stackexchange.com/q/14167/2444? I don't think it's an exact duplicate, but these two questions are related. Can you clarify how these two questions are different? – nbro May 05 '20 at 12:16

0 Answers0