Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?
Asked
Active
Viewed 131 times
1 Answers
3
One renowned example for the specified case is SeqGAN
Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.

OmG
- 1,731
- 10
- 19