1

It is easy to see the amount of disk space consumed by an LLM model (downloaded from huggingface, for instance). Just go in the relevant directory and check the file sizes.

How can I estimate the amount of GPU RAM required to run the model?

For example, if the Falcon 7B model takes around 14GB of storage, how much GPU RAM should suffice for it?

ahron
  • 131
  • 6

1 Answers1

1

It varies depending on various factors such as quantization. My rough rule of thumb is memory need is 2-4x of the disk size. Just as an example, the model at https://huggingface.co/TheBloke/wizardLM-7B-HF/tree/main is about 14GB on disk and it used ~30GB (on CPU) just to load. I am sure inference will increase this further.

beejay
  • 11
  • 2