The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination

Asked Apr 23 '23 at 14:27

Active Apr 23 '23 at 14:27

Viewed 33 times

I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model:

This model first archived about 0.03 accuracy score and gradually improved after then. It seems normal.

But when I multiply the decoder's GRU output by 2 (as the following picture), the model accuracy becomes so good (>0.9) at the very first epoch of training process (more specifically the first batch). I think it must be wrong somewhere. Can anyone give me an explaination for this?

asked Apr 23 '23 at 14:27

Đạt Trần

After struggling for a few hours, I finally figure out what happened here. Because the appearance of tf.math.multiply layer interrupts the mask propagation. Hence all zero-padded values will be used by the following layers, thus make the accuracy become unreasonable good. – Đạt Trần Apr 25 '23 at 02:39

The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination

0 Answers0