About Me

Hi, welcome to my homepage. I am Chen Liang (梁辰), a Member of Technical Staff at Microsoft AI. Before Microsoft, I completed my Ph.D. at Georgia Tech, advised by Prof. Tuo Zhao.

My current work focuses on post-training SWE agent models for GitHub Copilot in VS Code and CLI, across Microsoft and OpenAI model families. My work spans long-horizon RL, on-policy distillation, and format fine-tuning, with an interest in research-production co-design: turning production model failures into stable recipes for data efficiency, training stability, length control, and production-aligned behavior. Previously, I worked on mid-training for VS Code code completion models and pre-training for Phi models.

More broadly, my research interests lie in efficient and generalizable LLM training, guided by a “less is more” principle: efficiency can improve model quality by exposing essential learning signals. I study this through data-efficient training, parameter-efficient adaptation, stable optimization, and shorter yet stronger inference-time trajectories.

Recent News

  • Jun. 2026. We introduced MAI-Code-1-Flash, a SWE agentic model with 5B active parameters for GitHub Copilot in VS Code, achieving 51.2% on SWE-Bench Pro with up to 60% fewer tokens on hard SWE-Bench Verified tasks. As part of the core training effort, I worked on the agentic RL recipe, focusing on main run data mixture, length control, and production-aligned behavior.
  • Apr. 2026. NorMuon was accepted to ICML 2026 as a Spotlight, improving 1.1B-scale LLM pretraining efficiency by 11.3% over Muon and advancing the modded-nanoGPT speedrun leaderboard to 3250 steps [record, post].
  • May. 2025. I led the mid-training for GPT-4o Copilot, a GPT-4o-based code completion model for GitHub Copilot in VS Code, improving code suggestion quality.
  • Mar. 2025. I co-led the release of Phi-mini-MoE and Phi-tiny-MoE, compact MoE models for academic and edge-device settings with 2.4B / 1.1B activated parameters and 1M+ monthly downloads. Our SlimMoE paper presents the underlying compression framework.
  • Aug. 2024. We introduced Phi-3.5-MoE, a 16×3.8B MoE model pre-trained from scratch. I contributed to its architecture design and post-training, with the GRIN-MoE paper presenting the model design in detail.
  • Feb. 2024. I completed my Ph.D. thesis, "On Parameter Efficiency of Neural Language Models" [thesis, talk], and joined Microsoft as a Senior Researcher.
  • Jan. 2024. LoftQ, a LoRA-aware quantization framework, was accepted to ICLR 2024 as an Oral [talk, blog].

Selected Publications (Google Scholar, *Equal Contribution)

Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation
The 43th International Conference on Machine Learning (ICML), 2026
NorMuon: Making Muon More Efficient and Scalable
The 43th International Conference on Machine Learning (ICML Spotlight), 2026
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
The Second Conference on Language Modeling (COLM), 2025
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
The 13th International Conference on Learning Representations (ICLR), 2025
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
The 12th International Conference on Learning Representations (ICLR Oral), 2024
Module-wise Adaptive Distillation for Multimodality Foundation Models
The 37th Conference on Neural Information Processing Systems (NeurIPS), 2023
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
The 40th International Conference on Machine Learning (ICML), 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
The 11th International Conference on Learning Representations (ICLR), 2023
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
The 39th International Conference on Machine Learning (ICML), 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
The 10th International Conference on Learning Representations (ICLR), 2022
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
The 59th Annual Conference of the Association for Computational Linguistics (ACL), 2021

Experience

Senior Researcher, Microsoft AI, Feb. 2024 – Present
Research Intern, Microsoft Azure AI, Google Research, and Amazon, summers 2021–2023
SDE Intern, NVIDIA, May 2018 – Aug. 2018

Education

Ph.D. in Machine Learning, Georgia Institute of Technology, School of Industrial and Systems Engineering, Dec. 2023

B.S. in Electrical Engineering, University of Southern California, Department of Electrical and Computer Engineering, May 2018

Service & Teaching

Area Chair: NeurIPS 2025-2026
Reviewer: NeurIPS 2021–2024, ICML 2021–2025, ICLR 2021–2023, COLM 2024, EMNLP 2021–2022, ACL/NAACL 2021–2022
Teaching Assistant, ISyE 3030 Basic Statistical Methods, Georgia Tech, Fall 2020
Teaching Assistant, ISyE 3770 Statistics & Applications, Georgia Tech, Summer 2020
Teaching Assistant, CSE 6140 Algorithms, Georgia Tech, Fall 2019
Course Producer, EE 364 Introduction to Probability & Statistics, USC, Fall 2017

Talks

Dec. 2023. LoftQ: LoRA-Fine-Tuning-Aware Quantization, NeurIPS Third Workshop on Efficient Natural Language and Speech Processing
Sep. 2023. On Parameter Efficiency of Neural Language Models, Allen Institute for AI