Llama 2
2 mentions across 1 person
Visit ↗All mentions
“Llama 2 performance apparently scales linearly at least as far as 32 chips which at peak can generate almost 2,000 tokens per second.”
Emerging AI Inference Accelerators: A Landscape of Specialization ↗