0

What does it mean that the decoder can be parallelized during training?

Let's assume a transformer (with both encoder and decoder) is employed for a time-series prediction. I.e. from the input sequence x_0, ..., x_N we want to predict y_0, ..., y_N. Is this the way that parallelization occurs during training?

  • form the batch [], [y_0], ..., [y_0, ..., y_N-1]
  • feed this batch to the transformer, together with the input sequence
  • we will obtain the batch Y_0, Y_1, ..., Y_N
  • compare against y_0, ..., y_N and form the loss (*)

(*) here, some teacher ratio techiques may be employed, so that more passes may be required

Lilla
  • 111
  • 3
  • 1
    Does this answer your question? [Can the decoder in a transformer model be parallelized like the encoder?](https://ai.stackexchange.com/questions/12490/can-the-decoder-in-a-transformer-model-be-parallelized-like-the-encoder) – Minh-Long Luu Mar 12 '23 at 01:42
  • In fact no, I asked my question after reading that post. @Minh-LongLuu – Lilla Mar 12 '23 at 08:06

0 Answers0