Objdisco
1 mentions across 1 person
All mentions
guestrin
Recommendedpaper · 2026-05-13
“To address these limitations, we introduce Obj-Disco, a framework that automatically decomposes an alignment reward signal into a sparse, weighted combination of human-interpretable natural language objectives.”
Obj-Disco: Uncovering Hidden LLM Alignment Objectives via Iterative Decompositio ↗