I’m trying to understand the differences in inference time and training time between two models:
VGG16 with weights initialised from a Glorot uniform distribution and the same network with the only difference being that weights are initialised to the imagenet values.
Obviously the performance of the latter is higher and training time on my dataset is lower.
However, the training time per epoch is also lower. I’m not sure why this is, I also measured the inference time which was lower and roughly explains the difference in training time. But why would the inference time be lower given that it’s the exact same model with the same number of parameters?
Note that all tests were performed on the same device. VGG16 used was from the Keras library