Pingzhi Tang

Undergraduate Student at Peking University (Tong Class).
Advancing Efficient LLM Architectures.
Occasional observer of the world's quiet moments.

View Research Download CV

About Me

stanleytang [at] stu.pku.edu.cn

github.com/Stanleytowne Google Scholar

I am Pingzhi Tang, an undergraduate student in the General Artificial Intelligence Experimental Program (Tong Class) at Peking University.

With a major GPA of 3.92/4.00, my research focuses on enhancing the performance and efficiency of large-scale models, specifically in model architecture design and efficient inference.

Beyond the code, I am a photography enthusiast and film lover. I find that the intuition used to optimize a neural network often resonates with the process of composing a frame—both are pursuits of finding structure within complexity.

Education

Peking University

B.S. in Artificial Intelligence

Sept 2023 - Present

Tong Class.
Major GPA: 3.92/4.00.

Honors

Dean's Scholarship (Top 2%, 2025)
Second-Class Scholarship (Top 5%, 2024 and 2025)
Outstanding Student Honor (2024 and 2025)
Soong Ching Ling "Future Scholarship" (Top 5%, 2024)

* See CV for full list of awards.

Experience

Mμ Lab, Peking University
Undergraduate Researcher

July 2024 - Present

Advisor: Prof. Muhan Zhang.
Working on efficient inference, model architecture, PEFT and LLM reasoning.
Youtu Lab, Tencent

Research Intern

June 2025 - Sept 2025

Addressed KV cache overhead in MLA tensor parallelism. Achieved 2x inference acceleration.

Selected Publications

TransMLA: Multi-Head Latent Attention Is All You Need

NeurIPS 2025 (Spotlight)

Fanxu Meng*, Pingzhi Tang*, Zengwei Yao, Xing Sun, Muhan Zhang

Proposed a method to transform standard multi-head attention into multi-head latent attention, achieving up to 11x inference acceleration with negligible loss.

HD-PISSA: High-Rank Distributed Orthogonal Adaptation

EMNLP 2025 (Oral)

Yiding Wang, Fanxu Meng, Xuefeng Zhang, Fan Jiang, Pingzhi Tang, Muhan Zhang

Invented a distributed PEFT method for LLMs that increases update flexibility by assigning different weight components to each GPU.

CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning

ICML 2025

Fanxu Meng, Pingzhi Tang, Fan Jiang, Muhan Zhang

Developed a novel large model pruning and fine-tuning method using cross-layer singular value decomposition.

For a complete list of publications, please visit my Google Scholar or CV.

The Unsupervised World.

Capturing the moments that algorithms can't predict.

Enter Gallery

Writings

Coming soon.