Attention Sinks And Compression Valleys In Llms Are Two Sides Of The Same Coin
βIn this work, we present a surprising connection between attention sinks and compression valleys, tracing both to the formation of massive activations in the residual stream. We prove theoretically thβ¦β