I wonder if there's anyone who has actually succeeded in fine-tuning GPT-2's 774M model without using cloud TPU's. My GeForce RTX 2070 SUPER couldn't handle it in previous attempts.
I'm running TensorFlow 1.14.0 with CUDA V 9.1 on Ubuntu 18.04. For fine-tuning I'm using gpt-2-simple.
When fine-tuning using the 77M model, I keep running into OOM errors, such as:
W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 6.25MiB (rounded to 6553600). Current allocation summary follows.
So far I've tried:
- Using different a optimizer (
RMSPropOptimizer
instead ofAdamOptimizer
) - Setting batch-size to 1
use_memory_saving_gradients
only_train_transformer_layers
Fine-tuning works smoothly on the 355M model.
So what I'm really asking is:
- is it possible to fine-tune GPT-2's 774M model without industrial-sized hardware?
- if so, please tell me about your successful attempts
- apart from hardware-recommendations, how could fine-tuning be optimized to make 77M fit in memory?