Research
My research focuses on improving the efficiency of large language model (LLM) pre-training. I work on methods that scale from small proxy models to large foundation models, as well as the mathematical principles underlying training dynamics. I work on hyperparameter optimization, especially transferring hyperparameter configurations from small models to large-scale models. In particular, I study scaling-aware parameterizations such as maximum update parametrization (μP). I also study optimizers for large-scale training. Previously, I worked on transfer learning with theoretical guarantees, adversarial machine learning for 3D point cloud models, and streaming algorithms for time-varying volume data.
Publications and Manuscripts
(* indicates equal contribution.)
Towards Self-Adaptive Learning: A Comprehensive Study on Continual Learning under Harsh Conditions
Roozbeh Razavi-Far, Ehsan Hallaji, Alireza Fathalizadeh, Mengxi Wu, Mohammad Rostami
Neurocomputing 2026
[Paper]
GQA-μP: The Maximal Parameterization Update for Grouped Query Attention
Kyle R. Chickering*, Huijuan Wang*, Mengxi Wu*, Alexander Moreno, Muhao Chen, Xuezhe Ma, Daria Soboleva, Joel Hestness, Zhengzhong Liu, Eric Xing
Preprint
[Paper]
Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths
Xuezhe Ma*, Shicheng Wen*, Linghao Jin*, Bilge Acun*, Ruihang Lai*, Bohan Hou, Will Lin, Hao Zhang, Songlin Yang, Ryan Lee, Mengxi Wu, Jonathan May, Luke Zettlemoyer, Carole-Jean Wu
Preprint
[Paper]
[Code]
Curvature Diversity-Driven Nuclear-Norm Wasserstein Domain Alignment for Point Cloud
Mengxi Wu,
Hao Huang,
Yi Fang,
Mohammad Rostami
Transactions on Machine Learning Research 2025
[Paper]
[Code]
Graph Harmony: Denoising and Nuclear-Norm Wasserstein Adaptation for Enhanced Domain Transfer in Graph-Structured Data
Mengxi Wu,
Mohammad Rostami
Transactions on Machine Learning Research 2024
[Paper]
[Code]
Streaming Approach to In Situ Selection of Key Time Steps for Time-Varying Volume Data
Mengxi Wu,
Yi-Jen Chiang,
Christopher Musco
Eurographics/IEEE Conference on Visualization 2022
[Paper]
[Code]
3D Point Cloud Completion with Geometric-Aware Adversarial Augmentation
Mengxi Wu,
Hao Huang,
Yi Fang
International Conference on Pattern Recognition 2022
[Paper]
[Code]