I am working on a model for an NLP task. The model encodes the text and has a regression output layer.
In this task, from each instance (positive), I create several negative cases using a specific technique and I merge them with their positive corresponding ones in a data split (training/val/test). After that, I shuffle the data split.
I was thinking of the following: Isn't better to keep the negative instances with their corresponding positive ones in the same batch instead of shuffling the data?
Is there an answer to this question? does it depend on the task?