absorb.md

Simplerl

1 mentions across 0 people

Unknown speaker
paper · 2026-04-27
Warned against

We present failure cases of symbolic evaluation in two popular frameworks, Lighteval and SimpleRL, and compare them to our approach, demonstrating clear improvements over commonly used methods.

LLM-as-a-Judge for Math Reasoning Evaluation