No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Publication
The Tenth International Conference on Learning Representations (ICLR), 2022