In search for a new self-hosted LLM

Tanka@lemmy.ml · 10 days ago

In search for a new self-hosted LLM

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

For automation - you probably need something that is good at obeying tool calls (measured by BFCL bench - Berkeley Function Calling Leaderboard). You want something around 50+ overall (pref 60+) for automation.

https://gorilla.cs.berkeley.edu/leaderboard.html?

And, if you have 12GB, probably a model no larger than 32B.

Which somewhat narrows your choices down: a 14-32B model (assuming your willing to stick to partial offload as you are now?) with a BFCL bench >50. That sounds like one of the Qwen 3 models (30B? 32B?). Else, you go the other way (14B or less) and run fast.

As for coding: are you happy having SOTA be the “general” and the local model doing the grunt work (rather than local does it all?). If yes, something like GLM 5.1 running your local Qwen 3 via ECA (which I only learned about a little while ago) is great.

https://eca.dev/