GPT-2: (Hardware) requirements for fine-tuning the 774M model

Question

I wonder if there's anyone who has actually succeeded in fine-tuning GPT-2's 774M model without using cloud TPU's. My GeForce RTX 2070 SUPER couldn't handle it in previous attempts.

I'm running TensorFlow 1.14.0 with CUDA V 9.1 on Ubuntu 18.04. For fine-tuning I'm using gpt-2-simple.

When fine-tuning using the 77M model, I keep running into OOM errors, such as: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 6.25MiB (rounded to 6553600). Current allocation summary follows.

So far I've tried:

Using different a optimizer (RMSPropOptimizer instead of AdamOptimizer)
Setting batch-size to 1
use_memory_saving_gradients
only_train_transformer_layers

Fine-tuning works smoothly on the 355M model.

So what I'm really asking is:

is it possible to fine-tune GPT-2's 774M model without industrial-sized hardware?
if so, please tell me about your successful attempts
apart from hardware-recommendations, how could fine-tuning be optimized to make 77M fit in memory?

score 1 · Answer 1 · answered Apr 24 '21 at 11:35

Possibly a bit late to the answer, but I doubt you'd be able to run GPT-2 774M in FP32 on 2070 Super which has 8GB VRAM. I know it's not an exact comparison, but fine-tuning BERT Large (345M) in FP32 easily takes more than 10GB of VRAM. You might be able to run GPT-2 774M if you run it in FP16.

Alternatively, you can use Google Collab TPUs which provide at 11GB+ VRAM. Here's a good source listing a few posts about fine tuning GTP-2 1.5B on Google Collab TPUs: https://news.ycombinator.com/item?id=21456025

And here's the notebook itself demonstrating the process: https://colab.research.google.com/drive/1BXry0kcm869-RVHHiY6NZmY9uBzbkf1Q#scrollTo=lP1InuxJTD6a

GPT-2: (Hardware) requirements for fine-tuning the 774M model

1 Answers1