absorb.md

Mindsmall

1 mentions across 0 people

Unknown speaker
paper · 2026-04-17
Recommended

We evaluate on two contextual bandit environments - UCI Mushroom (2-arm, asymmetric rewards) and MIND-small (5-arm news recommendation) - and find that when equipped with a task-specific prompt, LLM pseudo-observations reduce cumulative regret by 19% on MIND relative to pure LinUCB.

LLM Pseudo-Observations Enhance Contextual Bandits with Calibration Gating