2

The following question is from the webbook Neural Networks and Deep Learning by Michael Nielson:

How do our machine learning algorithms perform in the limit of very large data sets? For any given algorithm it's natural to attempt to define a notion of asymptotic performance in the limit of truly big data. A quick-and-dirty approach to this problem is to simply try fitting curves to graphs like those shown above, and then to extrapolate the fitted curves out to infinity. An objection to this approach is that different approaches to curve fitting will give different notions of asymptotic performance. Can you find a principled justification for fitting to some particular class of curves? If so, compare the asymptotic performance of several different machine learning algorithms.

The ability to mimic complex curves and fit to the data points comes due to the non-linearity used, since, had we only used a linear combination of weights and biases, we would not have been able to mimic these. Now the output depends a lot on our choice of non-linearity. Suppose we have a model. It overfits and we get an order 5 polynomial, while in another case it underfits and we get a linear model. So how would we get a good estimation of the asymptotic performance, as questioned by the author?

The Pointer
  • 527
  • 3
  • 17
  • To be honest, I still don't understand your question. What do you mean by "this" in "How can I think about this..."? Moreover, it's not clear to me if by "time complexity" you're referring to the time complexity of the training algorithm (e.g. gradient descent) or something else. Can you clarify that? – nbro Jun 24 '21 at 15:46
  • @nbro So I come from a CS background and recently started DL. I've been accustomed to using time and space complexity to analyze efficiency of algorithms. I'm not really clear on how the author wants us to think of asymptotic performance so a bit of that vagueness reflects in the question – FoundABetterName Jun 24 '21 at 18:06
  • To me, it seems that you're mixing the idea of a model (e.g. a neural network) with the idea of the time complexity of a training algorithm (e.g. gradient descent), so it would be better that you first understand that there's a difference between a neural network and the algorithm that trains it using the data. Once you understand that, I think you will have a clearer idea of what's going on and that you can still compute the time complexity of an algorithm. So, before proceeding, I would suggest that you read about "how gradient decent is used to train a neural network" – nbro Jun 24 '21 at 20:01
  • Just to give you an example that the notion of time complexity of an algorithm applies to the context of machine learning as well, take a look at [this answer](https://ai.stackexchange.com/a/13626/2444) that I wrote, where I show how you can compute the time complexity of the forward pass of a neural network, i.e. the algorithm that produces the output of the neural network given the input, which is known as "forward pass". – nbro Jun 24 '21 at 20:05
  • Once you have gone through that, I would suggest that you edit this post to clarify what your _ultimate_ specific question is. I think your doubt is interesting, but your question just need to be clarified. Maybe your ultimate question is: "How do we measure the time complexity of a machine learning algorithm?". If that's the case, please, just edit your post to leave that question. If not, again, please, tell us what your specific question is. – nbro Jun 24 '21 at 20:06
  • Thanks a lot @nbro , I'll read the reference and update the question in a while :) – FoundABetterName Jun 24 '21 at 21:02

0 Answers0