Transformers are data and GPU hungry during training. Is this also true at inference time? How do transformers compare to feedforward CNNs e.g., during bounding box generation at inference time? I haven't found a good comparison of computing time and computational resources.
Asked
Active
Viewed 58 times