Seqvcr Preventing Collapse In Intermediate Transformer Representations For Enhanced Reasoning
1 mentions across 1 person
Visit ↗All mentions
“In this work, we identify representation collapse in the model's intermediate layers as a key factor limiting their reasoning capabilities. To address this, we propose Sequential Variance-Covariance Regularization (Seq-VCR), which enhances the entropy of intermediate representations and prevents collapse.”
Seq-VCR: Regularization for Enhanced Transformer Reasoning ↗