Group Relative Policy Optimization Grpo
1 mentions across 0 people
All mentions
Unknown speaker
Mixedpaper · 2026-04-10
“We systematically study this phenomenon across seven challenging real-world spatial reasoning benchmarks and find that it affects contemporary MRMs such as ViGoRL-Spatial, TreeVGR as well as our own models trained with standard Group Relative Policy Optimization (GRPO).”
Faithful GRPO: Enhancing Visual Spatial Reasoning in Multimodal Language Models ↗