No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Publication
The 10th International Conference on Learning Representations (ICLR), 2022