Pretrain large model on single GPU

Question

i want to pretrain some model on P100 which is provided by kaggle. Pretraining on 3 A100 is about 1.5 day. I have 2 questions:

Can I put the same seed everywhere so that the results match, train the model for 12 hours, saving all the variables and weights at the end? Then, when the 12 hour session is over, continue pretraining in another session?
I'm used to using KFold, should i use it for pretraining such huge model?

score 0 · Accepted Answer · answered Dec 21 '22 at 06:30

Yes, you should be able to save the weights and resume the training session at a later time and on a different GPU/server. This sort of functionality should be supported by most ML libraries (e.g. TF, PyTorch, JAX).

You can use k-fold cross validation if you really want to, but that will increase the training time, because you need to train and validate on differnt splits. I think nowadays people tend to use a single train/val split, especially given that most datasets for pretraining are fairly large.

Pretrain large model on single GPU

1 Answers1