Paper
2 mentions across 1 person
Visit ↗All mentions
Unknown speaker
Recommendedpaper · 2026-04-21
“The paper should therefore be read as a stylized mechanism study rather than a general explanation of neural-network loss spikes.”
Batch Normalization Postpones Loss Instability by Gradually Amplifying Effective ↗“Paper below tested a variety of base LLMs (no TTA) on generalization-focus math problems and found that they can't reason and can't do math.”
LLMs Lack Fluid Intelligence, While LRMs Show Promise in Reasoning ↗