Adamw
1 mentions across 0 people
All mentions
Unknown speaker
Recommendedpaper · 2026-05-26
“A 625-run follow-up (Phase 5) probes the null along five axes: optimiser (AdamW), schedule shape (cosine), training length (up to 9x more iterations)”
LR Schedule Is Bit-Width-Agnostic for Sub-100M QAT — Except INT4 Above 50M Param ↗