Advice - Getting started with LLMs

its_me_xiphos@beehaw.org · 9 months ago

Advice - Getting started with LLMs

Zworf@beehaw.org · edit-2 9 months ago

Hmmm weird. I have a 4090 / Ryzen 5800X3D and 64GB and it runs really well. Admittedly it’s the 8B model because the intermediate sizes aren’t out yet and 70B simply won’t fly on a single GPU.

But it really screams. Much faster than I can read. PS: Ollama is just llama.cpp under the hood.

Edit: Ah, wait, I know what’s going wrong here. The 22B parameter model is probably too big for your VRAM. Then it gets extremely slow yes.

BaroqueInMind@lemmy.one · 9 months ago

What is the appropriate size for 10Gb VRAM?

Zworf@beehaw.org · 9 months ago

It depends on your prompt/context size too. The more you have the more memory you need. Try to check the memory usage of your GPU with GPU-Z with different models and scenarios.

xcjs@programming.dev · edit-2 9 months ago

It should be split between VRAM and regular RAM, at least if it’s a GGUF model. Maybe it’s not, and that’s what’s wrong?