Mindsmall
1 mentions across 0 people
All mentions
Unknown speaker
Recommendedpaper · 2026-04-17
“We evaluate on two contextual bandit environments - UCI Mushroom (2-arm, asymmetric rewards) and MIND-small (5-arm news recommendation) - and find that when equipped with a task-specific prompt, LLM pseudo-observations reduce cumulative regret by 19% on MIND relative to pure LinUCB.”
LLM Pseudo-Observations Enhance Contextual Bandits with Calibration Gating ↗