Reinforcement Learning From Verifiable Rewards Rlvr On Chainofthought Reasoning
1 mentions across 1 person
All mentions
guestrin
Recommendedpaper · 2026-05-13
“In this paper, we develop two metrics for critically examining this assumption: Causal Importance of Reasoning (CIR)... and Sufficiency of Reasoning (SR)...”
Outcome-Based RL Fails to Guarantee Causal or Sufficient Reasoning in LLMs ↗