• hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    3 months ago

    Thanks for the numbers. Btw, I think a NPU can’t run large language models in the first place. They’re meant for things like blur the background in video conferences, or help with speech recognition or such very specific smaller tasks. They only have some tens or hundreds of megabytes of memory, so a LLM/chatbot won’t fit. The main thing that makes LLM inference faster is memory (RAM) bandwith and speed.

    • infinitevalence
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 months ago

      I added a few more numbers. I may pull my old MI25 out of mothballs and bench that to.