📄 paper · by Joel Bakr

Many Benchmark Scores Would Appear Much Less Impressive If They Simply Used More Tokens

1 mentions across 1 person

All mentions

tweet · 2026-04-05

Recommended

“Benchmark performance is actually limited by token usage. https://joelbkr.substack.com/p/many-benchmarks-scores-would-appear?r=i5f7&utm_medium=ios&triedRedirect=true”

Large Language Models Exhibit Continued Performance Scaling with Increased Token ↗