Currently, when using the code on this branch: https://github.com/benchopt/benchmark_resnet_classif/pull/53, I use 35 minutes per epoch to train a ResNet-18 on ImageNet in TensorFlow with a V100 GPU and a batch size of 128, with standard data augmentations.
I haven't found other mentions of the training times for a ResNet-18 with a standard training policy, so I am just mentioning this to kick off the conversation without claiming that this is the best one can get.
EDIT
With timm
it is possible on a single V100 GPU to reach 0.1s per iteration with 128 batch size (see this discussion I had with Ross Wightman). In total this gives a 16-minute epoch. I think this is a very good baseline and you can make it even faster with AMP, a larger batch size and of course distributing.
My implementation here in PyTorch, achieves 22' per epoch, I might be missing some optimizations here and there.