The following question is from the webbook Neural Networks and Deep Learning by Michael Nielson:
How do our machine learning algorithms perform in the limit of very large data sets? For any given algorithm it's natural to attempt to define a notion of asymptotic performance in the limit of truly big data. A quick-and-dirty approach to this problem is to simply try fitting curves to graphs like those shown above, and then to extrapolate the fitted curves out to infinity. An objection to this approach is that different approaches to curve fitting will give different notions of asymptotic performance. Can you find a principled justification for fitting to some particular class of curves? If so, compare the asymptotic performance of several different machine learning algorithms.
The ability to mimic complex curves and fit to the data points comes due to the non-linearity used, since, had we only used a linear combination of weights and biases, we would not have been able to mimic these. Now the output depends a lot on our choice of non-linearity. Suppose we have a model. It overfits and we get an order 5 polynomial, while in another case it underfits and we get a linear model. So how would we get a good estimation of the asymptotic performance, as questioned by the author?