📄 paper

Rejectiongated Policy Optimization Rgpo

1 mentions across 0 people

All mentions

Unknown speaker

paper · 2026-04-17

Recommended

“We propose a new perspective on policy optimization: rather than reweighting all samples by their importance ratios, an optimizer should select which samples are trustworthy enough to drive a policy update. Building on this view, we introduce Rejection-Gated Policy Optimization (RGPO)”

Rejection-Gated Policy Optimization Improves RL Reliability and Performance ↗