So here's an example set to help explain my doubt. Suppose I have 80,000 total images available for a DNN training task. With a batch size of 32, that is 2500 batches.
Now let's say I partition the dataset into two bins of 40,000 images each. Now, for each training epoch, when I form a batch of size 32, I either form a batch from bin 1 or bin 2. i.e., there is some reduction in global randomness in picking the batches to train on. Assume that the representation of the dataset is more or less consistent in both bins. (if it were a 4-way classification task, I have sufficient data points of all 4 targets in both of these bins).
My question is how much am I affecting the effectiveness of the DNN model (Accuracy, Convergence, Bias) by using this mechanism of training?
I can use basic Permutation and Combination theory to understand that I am obviously losing out on some batches that can not be formed across the bins. Also, to what extent can I apply this binning? What If i form three bins of 20k, 20k, 40k and apply the same concept?
Kindly help me understand. Mathematical guarantees / approximations would be appreciated, but an intuitive explanation is also welcome.
thanks!