How do you interpret this learning curve?

Question

Loss is MSE; orange is validation loss, blue training loss. The task is NN regression (18 inputs, 2 outputs), one layer 300 hidden units.

Tuning the lr, mom, l2 regularization parameters this is the best validation loss I can obtain. Can be considered overfitting? Is 1 a bad vl loss value for a regression task?

score 3 · Answer 1 · answered Nov 27 '19 at 04:13

Depends on what does 1 represent in your task. If you are trying to predict household prices and 1 represents \$1, I think the average validation loss is good. If 1 represents \$10000 in this case, probably something is not right. But remember that there are 2 parts contributing to the overall loss. The mse loss and the l2 penalty loss. (Also remember that most optimizers already implement l2 penalty as weight decay. So you do not need to add it separately)

Some suggestions.

Check if your data has any outliers/ anomalies. Based on your task you should know as to what you can do with these data points. Also see if your dataset has high variance.
If you are worried about over-fitting, think about your data again. Less data + more parameters often leads to over-fitting. If your dataset is too small, you need to think again.
Try to adjust the number of hidden units and observe the results.
Try using cross validation.
Alternatively, try using different optimizers and see what happens (Try Adam).

score 3 · Answer 2 · answered Nov 28 '19 at 09:55

3

The validation loss settles exactly at an error of one. Probably means there's something off with either the kind of data validation set has or with something in the training. An exact validation loss of one almost definitely means there's something off.

I'd recommend before doing anything thoroughly go through your data or see if there's anything to debug in the model itself. Considering training error reduces there's probably something different about either the formatting of the validation data or the validation data itself.

A mild description of the type of data and exact problem in hand could further help.

answered Nov 28 '19 at 09:55

ashenoy

1,409
4
18

What is 'almost definitely' wrong with an MSE of 1, and why? – desertnaut Nov 29 '19 at 23:42
Why would validation error trends settle at exactly one when it can take any real value. The only way validation error would settle is if the training defined a minima too hard for it to come out of exactly at a validation error of 1, meaning there's surely something to debug in the data. – ashenoy Nov 30 '19 at 07:05
You seem to overinterpret a coarse plot which, if zoomed in, would most probably reveal fluctuations around 1.0. It would be safer to ask for a clarification here than assuming that it is *exactly* 1.0. The plot by itself shows nothing wrong – desertnaut Nov 30 '19 at 10:28

score 1 · Answer 3 · answered Nov 30 '19 at 00:05

The telltale signature of overfitting is when your validation loss starts increasing, while your training loss continues decreasing, i.e.:

(Image adapted from Wikipedia entry on overfitting)

It is clear that this does not happen in your diagram, hence your model does not overfit.

A difference between a training and a validation score by itself does not signify overfitting. This is just the generalization gap, i.e. the expected gap in the performance between the training and validation sets; quoting from a recent blog post by Google AI:

An important concept for understanding generalization is the generalization gap, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution.

An MSE of 1.0 (or any other specific value, for that matter) by itself cannot be considered "good" or "bad"; everything depends on the context, i.e. of the particular problem and the actual magnitude of your dependent variable: if you are trying to predict something that is in the order of some thousands (or even hundreds), an MSE of 1.0 does not sound bad; it's not the same if your dependent variable takes values, say, in [0, 1] or similar.

The output is two values in floating point between -30 and 30: what can you say about MSE 1.0? — Andrea Favilli, Nov 30 '19 at 19:37

How do you interpret this learning curve?

3 Answers3