🛠 tool

Mindsmall

1 mentions across 0 people

All mentions

Unknown speaker

paper · 2026-04-17

Recommended

“We evaluate on two contextual bandit environments - UCI Mushroom (2-arm, asymmetric rewards) and MIND-small (5-arm news recommendation) - and find that when equipped with a task-specific prompt, LLM pseudo-observations reduce cumulative regret by 19% on MIND relative to pure LinUCB.”

LLM Pseudo-Observations Enhance Contextual Bandits with Calibration Gating ↗