Uq Assessing Language Models On Unsolved Questions
โWe introduce UQ, a testbed of 500 challenging, diverse questions sourced from Stack Exchange, spanning topics from CS theory and math to sci-fi and history, probing capabilities including reasoning, fโฆโ