Typical FLOPs for training and hyperparameter tuning of standard models?

Asked Mar 09 '23 at 02:01

Active Mar 09 '23 at 02:01

Viewed 58 times

I am trying to find estimates of the typical FLOPs required for training a modern model. The FLOPs used for inference are widely reported and cluster around $10^6$ - $10^7$ for many large models. However, I cannot find equivalent stats for training; the best I can do is guess an order of magnitude based on the number of training epochs times the number of training data points times the inference FLOPs.

When I do this, I estimate that AlexNet and its contemporaries would have required about $10^{18}$ FLOPs.

Is this the right order of magnitude? How many FLOPs do modern networks typically require for all stages of training (I'm also interested in estimates that include hyperparameter tuning)?

asked Mar 09 '23 at 02:01

wil3

Typical FLOPs for training and hyperparameter tuning of standard models?

0 Answers0