absorb.md

Uq Assessing Language Models On Unsolved Questions

1 mentions across 1 person

Visit ↗
Andrew Ng
paper · 2025-08-25
Recommended

We introduce UQ, a testbed of 500 challenging, diverse questions sourced from Stack Exchange, spanning topics from CS theory and math to sci-fi and history, probing capabilities including reasoning, factuality, and browsing. UQ is difficult and realistic by construction: unsolved questions are often hard and naturally arise when humans seek answers, thus solving them yields direct real-world value.

UQ: A Novel Benchmark for Language Model Evaluation on Unsolved Questions