📄 paper · by Kevin Roose

More Here Including Swebench Score Of 939 And A New Model Behavior Known As Answerthrashing

1 mentions across 1 person

All mentions

tweet · 2026-04-07

Recommended

“More here, including SWE-bench score of 93.9% (!) and a new model behavior known as "answer-thrashing" https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf”

Anthropic's Claude Mythos Preview: Advanced AI Capabilities and Emerging Risks ↗