📄 paper · by Yann LeCun

Spike The Sparse And The Sink Anatomy Of Massive Activations And Attention Sinks

1 mentions across 1 person

All mentions

paper · 2026-03-05

Recommended

“We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear.”

Transformer Behavior: Decoupling Massive Activations and Attention Sinks ↗