Yang Et Al 2024
1 mentions across 0 people
All mentions
Unknown speaker
Recommendedpaper · 2026-04-21
“Cross-benchmark validation on 18 models using MMLU with verbalized confidence and on external data from Yang et al. (2024) confirms the screen transfers across benchmarks and probe formats.”
Borrowing Clinical Psychometrics to Validate LLM Confidence Signals Before Use ↗