Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • ImperialStout@beehaw.org
    link
    fedilink
    arrow-up
    3
    ·
    3 hours ago

    This sounds great to me. Anything that would increase supply of AI processing could lower demand on the GPU supply. I want to be able to upgrade my gaming computer again someday!

    • Appoxo@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      1
      ·
      2 hours ago

      Every chip that is produced, takes away capacity that could have been used for consumer products.

      So yeah…not great.