0

I am trying to train an LSTM using CTC loss, but the loss does not decrease when I train it. I have created a minimal example of my issue by creating training data where the network simply has to copy the current input element at each time step. Moreover, I have made the length of the label the same as the length of the input sequence and no adjacent elements in the label sequence the same so that both CTC loss and categorical cross-entropy loss can be used. I found that when using categorical cross-entropy loss the model very quickly converges, whereas when using CTC loss it gets nowhere.

I have uploaded by minimal example to colab. Does anyone know why CTC loss is not working in this case?

nbro
  • 39,006
  • 12
  • 98
  • 176
  • Could you please provide the mathematical formula of the CTC loss that you implemented, to have more context? – nbro Jan 16 '21 at 21:34
  • 1
    I did not implement the loss function myself, but rather am using a built-in function from tensorflow, [`tf.nn.ctc_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss). The mathematical formulation of this is available in the paper [_Connectionist Temporal Classification - Labeling Unsegmented Sequence Data with Recurrent Neural Networks_](http://www.cs.toronto.edu/~graves/icml_2006.pdf). – Cameron Martin Jan 17 '21 at 14:56
  • Thanks! I would like to note that programming issues are off-topic here. So, next time, if you think your problem is a bug in your code and not a conceptual issue, then you should ask your question on Stack Overflow. Here, we focus on the theoretical, philosophical, and social aspects of Artificial Intelligence. Take a look at https://ai.stackexchange.com/help/on-topic. – nbro Jan 17 '21 at 17:18
  • Noted. Thanks for the direction. – Cameron Martin Jan 18 '21 at 02:01

1 Answers1

0

The problem was the dimensions of the logit_length argument to tf.nn.ctc_loss was incorrect.

It was this:

tf.repeat(tf.shape(y_pred)[-1], tf.shape(y_pred)[0])

But it should have been

tf.repeat(tf.shape(y_pred)[-2], tf.shape(y_pred)[0])