I am trying to train an LSTM using CTC loss, but the loss does not decrease when I train it. I have created a minimal example of my issue by creating training data where the network simply has to copy the current input element at each time step. Moreover, I have made the length of the label the same as the length of the input sequence and no adjacent elements in the label sequence the same so that both CTC loss and categorical cross-entropy loss can be used. I found that when using categorical cross-entropy loss the model very quickly converges, whereas when using CTC loss it gets nowhere.
I have uploaded by minimal example to colab. Does anyone know why CTC loss is not working in this case?