5

In the deep learning specialization course by Andrew Ng, in the video Sequence Models (minute 4:13), he says that in negative sampling we have to choose a sample of words from the corpus to train rather than choosing the whole corpus. But he said that, for smaller datasets, we need a bigger number of samples, for example, 5-20, and, for larger datasets, we need a smaller sample, for example, 2-5. By sample, I am referring to the number of words along with the target word we have taken to train the model.

Why do small datasets require more samples, while big datasets require fewer samples?

  • I think your title is a bit misleading. The way I read it was that you need to have more samples in small datasets, while you are referring to the negative samples in negative sampling (not dataset samples). Also your tags seem a bit irrelevant, especially the one about CNNs. I suggest modifying your question a bit so that there isn't any confusion and you can get a proper answer. – Djib2011 Oct 30 '19 at 09:37
  • @Djib2011 The current title is the result of my edit, because the previous one was not very descriptive. If you have an idea for a better title, please, edit the question or propose an edit. – nbro Oct 30 '19 at 18:23

1 Answers1

1

He likely found this to a be a best practice to avoid over fitting, with a small data set if you only use small and easy to learn (less words -> less degrees of freedom) sequences then you open your model to the risk of over fitting that data set where as on a large data set that has alot more total information you can train on small sequences without being at risk of over fitting because although the smaller sequences will be easier to learn the variance of sequences will be much higher.

nickw
  • 317
  • 1
  • 7