I'm working with very weird data that is apparently very hard to fit. And I've noticed a very strange phenomenon where it can go from roughly 0.0176 validation MSE to 1534863.6250 validation MSE in only 1 epoch! It usually then will return to a very low number after a few epochs. Also, no such fluctuation is seen in the training data.
This behavior of instability is consistent across shuffling, repartitioning & retraining. Even though I have 16,000 samples and a highly regularized network (dropout + residual layers + batch normalization + gradient clipping).
I mean I realize I could have more data, but, still, this behavior is really surprising. What could be causing it?
P.S. Model is feedforward with 10 layers of size [32,64,128,256,512,256,128,64,32,1], using Adam optimizer. Also, this question may be related (my experience is also periodic validation loss), but I don't think they experienced the same massive instability I am seeing.