Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • iceberg314@midwest.social
    link
    fedilink
    arrow-up
    1
    ·
    20 hours ago

    I bet you could! The interface and literally be what ever you want with FPGAs. You’d just have to keep things organized and program them one at a time I think