cross-posted from: https://lemmy.ml/post/45766694

Hey :) For a while now I use gpt-oss-20b on my home lab for lightweight coding tasks and some automation. I’m not so up to date with the current self-hosted LLMs and since the model I’m using was released at the beginning of August 2025 (From an LLM development perspective, it feels like an eternity to me) I just wanted to use the collective wisdom of lemmy to maybe replace my model with something better out there.

Edit:

Specs:

GPU: RTX 3060 (12GB vRAM)

RAM: 64 GB

gpt-oss-20b does not fit into the vRAM completely but it partially offloaded and is reasonably fast (enough for me)

  • SuspciousCarrot78@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    8 days ago

    For automation - you probably need something that is good at obeying tool calls (measured by BFCL bench - Berkeley Function Calling Leaderboard). You want something around 50+ overall (pref 60+) for automation.

    https://gorilla.cs.berkeley.edu/leaderboard.html?

    And, if you have 12GB, probably a model no larger than 32B.

    Which somewhat narrows your choices down: a 14-32B model (assuming your willing to stick to partial offload as you are now?) with a BFCL bench >50. That sounds like one of the Qwen 3 models (30B? 32B?). Else, you go the other way (14B or less) and run fast.

    As for coding: are you happy having SOTA be the “general” and the local model doing the grunt work (rather than local does it all?). If yes, something like GLM 5.1 running your local Qwen 3 via ECA (which I only learned about a little while ago) is great.

    https://eca.dev/