0

I wonder if there's anyone who has actually succeeded in fine-tuning GPT-2's 774M model without using cloud TPU's. My GeForce RTX 2070 SUPER couldn't handle it in previous attempts.

I'm running TensorFlow 1.14.0 with CUDA V 9.1 on Ubuntu 18.04. For fine-tuning I'm using gpt-2-simple.

When fine-tuning using the 77M model, I keep running into OOM errors, such as: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 6.25MiB (rounded to 6553600). Current allocation summary follows.

So far I've tried:

  • Using different a optimizer (RMSPropOptimizer instead of AdamOptimizer)
  • Setting batch-size to 1
  • use_memory_saving_gradients
  • only_train_transformer_layers

Fine-tuning works smoothly on the 355M model.

So what I'm really asking is:

  • is it possible to fine-tune GPT-2's 774M model without industrial-sized hardware?
  • if so, please tell me about your successful attempts
  • apart from hardware-recommendations, how could fine-tuning be optimized to make 77M fit in memory?
nbro
  • 39,006
  • 12
  • 98
  • 176
Comfort Eagle
  • 129
  • 1
  • 4

1 Answers1

1

Possibly a bit late to the answer, but I doubt you'd be able to run GPT-2 774M in FP32 on 2070 Super which has 8GB VRAM. I know it's not an exact comparison, but fine-tuning BERT Large (345M) in FP32 easily takes more than 10GB of VRAM. You might be able to run GPT-2 774M if you run it in FP16.

Alternatively, you can use Google Collab TPUs which provide at 11GB+ VRAM. Here's a good source listing a few posts about fine tuning GTP-2 1.5B on Google Collab TPUs: https://news.ycombinator.com/item?id=21456025

And here's the notebook itself demonstrating the process: https://colab.research.google.com/drive/1BXry0kcm869-RVHHiY6NZmY9uBzbkf1Q#scrollTo=lP1InuxJTD6a

Dan Pavlov
  • 36
  • 2