Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.
Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.
This sounds great to me. Anything that would increase supply of AI processing could lower demand on the GPU supply. I want to be able to upgrade my gaming computer again someday!
Every chip that is produced, takes away capacity that could have been used for consumer products.
So yeah…not great.