0

I trained a neural network on an NLP problem and compared the loss and BLEU score on the validation data with the same training parameters in two scenarios: a) when I trained on 25% of the data, b) when I trained on 100% of the data. I observed a peculiar pattern: the validation loss and BLEU score is much lower in b) as compared to a). What could this mean? My guess is that there are duplicates in the training data, which lead to overfitting, but I still have to investigate that.

1 Answers1

1

As you pointed out, duplicates can be a possible reason for such behavior. There are a few more possibilities:

  1. Class Imbalance - data is skewed towards a particular class(if you are solving a classification problem).

  2. Model isn't learning well for 100% of data for given parameters. Try changing the learning rate, or adding some constraints such as:

if t == int(args.num_iter * 0.5):
     lr = some_small_value*lr
if t == int(args.num_iter * 0.75):
     lr = smaller_value*lr
  1. It may so happen that the model you have selected is unable to handle weights when input data is large. Try changing the model and see if some other model works better.

I hope this helps.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 24 '23 at 18:29
  • Hi, thank you for your answer. I don't have a classification problem, but it's useful to point that out. Your second point simply looks like a learning rate scheduler, which I could indeed implement. I don't know what you mean with 3., though. It seems like rather general advice, so can you specify what you mean by "unable to handle weights when input data is large"? – postnubilaphoebus Mar 24 '23 at 20:27
  • It's similar to the reason we need LSTM/GRU over RNN. Also, LSTM has some variations like peephole LSTM to further increase the capacity of the model to process data and store weights. It may so happen that your model is as good as an RNN, i.e. it can handle data when its size is less, but as you increase the data, you need a more advanced model such as a LSTM. – Nityanand Mathur Mar 26 '23 at 18:06
  • I am using GRUs, but I will look into peephole LSTM to understand the difference you are talking about. – postnubilaphoebus Mar 28 '23 at 14:02