📄 paper · by Andrew Ng

Uq Assessing Language Models On Unsolved Questions

1 mentions across 1 person

All mentions

paper · 2025-08-25

Recommended

“We introduce UQ, a testbed of 500 challenging, diverse questions sourced from Stack Exchange, spanning topics from CS theory and math to sci-fi and history, probing capabilities including reasoning, factuality, and browsing. UQ is difficult and realistic by construction: unsolved questions are often hard and naturally arise when humans seek answers, thus solving them yields direct real-world value.”

UQ: A Novel Benchmark for Language Model Evaluation on Unsolved Questions ↗