assa

Publications

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Module-wise Adaptive Distillation for Multimodality Foundation Models
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Self-Training with Differentiable Teacher
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Token-wise Curriculum Learning for Neural Machine Translation
ARCH: Efficient Adversarial Regularized Training with Caching
Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision