Let's say I've got a training sample set of 1 million records, which I pull batches of 100 from to train a basic regression model using gradient descent and MSE as a loss function. Assume test and cross validation samples have already been withheld from the training set, so we have 1 million entries to train with.
Consider following cases:
- Run 2 epochs (I'm guessing this one is potentially bad as it's basically 2 separate training sets)
- In the first Epoch train over records 1-500K
- In the second epoch train over the 500K-1M
- Run 4 epochs
- In the first and third Epoch train over records 1-500K
- In the second and fourth epoch train over the 500K-1M
- Run X epochs, but each epoch has a random 250K samples from the training set to choose from
Should every epoch have the exact samples? Is there any benefit/negative to doing so? My intuition is any deviation in samples changes the 'topography' of the surface you're descending, but I'm not sure if the samples are from the same population if it matters.
This relates to a SO question: https://stackoverflow.com/questions/39001104/in-keras-if-samples-per-epoch-is-less-than-the-end-of-the-generator-when-it