Page not found

Perhaps you were looking for one of these?

Latest

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Module-wise Adaptive Distillation for Multimodality Foundation Models
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach