1

I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model:

Model 1 with decoder using one GRU

This model first archived about 0.03 accuracy score and gradually improved after then. It seems normal.

But when I multiply the decoder's GRU output by 2 (as the following picture)enter image description here, the model accuracy becomes so good (>0.9) at the very first epoch of training process (more specifically the first batch). I think it must be wrong somewhere. Can anyone give me an explaination for this?

  • After struggling for a few hours, I finally figure out what happened here. Because the appearance of tf.math.multiply layer interrupts the mask propagation. Hence all zero-padded values will be used by the following layers, thus make the accuracy become unreasonable good. – Đạt Trần Apr 25 '23 at 02:39

0 Answers0