1

When training any seq2seq model you have a target and a source. The source may be a sentence such as:

I_walked_the_dog

And the target being

_walked_the_dogg

Where as you can see the expected output for the initial I is a space _. My question is, at training time, whether to use the models previous outputs for predicting the next output, or to run the training simultaneously using the expected outputs. To illustrate this more clearly, see below:

enter image description here

The incentive for training using the expected outputs is that all time steps can be trained simultaneously, so it speeds up training by a factor of the sequence lengths. However, it means that training is not representative of what the network will realistically be doing at testing time, as at testing time the network will not have perfect previous outputs, but rather only it's own.

Recessive
  • 1,346
  • 8
  • 21
  • I think you're referring to [teacher forcing](https://ai.stackexchange.com/q/18006/2444). – nbro Apr 23 '22 at 08:47

0 Answers0