I work as a Senior Researcher at Microsoft, where I'm a part of the DeepSpeed team. My research focuses on enhancing the efficiency of large-scale machine learning systems, covering both training and inference. I completed my Ph.D. from the University of Nevada, Reno in 2022, under the guidance of Dr. Feng Yan and Dr. Lei Yang. Before that, I earned my Bachelor's degree from the University of Electronic Science and Technology of China in 2017. My work is dedicated to exploring ways to improving the efficiency of large models.
SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement
Large scale training requires massive parallelism to finish the training within a reasonable amount of time. To support massive parallelism, large batch training is the key enabler but often at the cost of generalization performance. We propose a fully automated and lightweight adaptive batching methodology to enable fine-grained batch size adaption (e.g., at a mini-batch level) that can achieve state-of-the-art performance with record breaking batch sizes. The core component of our method is a lightweight yet efficient representation of the critical gradient noise information. Extensive evaluations on popular benchmarks (e.g., CIFAR10, ImageNet, and BERT-Large) demonstrate that the proposed methodology outperforms state-of-the-art methodologies using adaptive batching approaches or hand-tuned static strategies in both performance and batch size. Particularly, we achieve a new state-of-the-art batch size of 78k in BERT-Large pretraining with SQuAD score 90.69 compared to 90.58 reported in previous state-of-the-art with 59k batch size.
Region Based Reinforcement Learning Scheduling Framework for MLaaS
The parallelism settings in Machine Learning as a Service (MLaaS) have critical impact on the system performance. It is a challenge to tune the parallel config because of the complex dependency and large search space. We propose a region based reinforcement learning (RRL) approach that can converge to near-optimal config magnitude faster than the traditional reinforcement learning. The proposed RRL is prototyped and evaluated using several real-world machine learning workloads. Both theoretical analysis and experiment evaluation show that RRL outperforms state-of-the-art tuning algorithms for MLaaS.
RRL Plus: Adaptive Region Based Reinforcement Learning for Machine Learning
RRL is sensitive to region size. Excessive region size leads to large performance gap between RRL solution and optimal one whereas inadequate region size leads to longer learning process. We further expand the Region Based Reinforcement Learning algorithm by Bayesian optimization and heuristic algorithm and enable it to automatically adjust the region size to achieve fast converge and near-optimal solution.
Semi-supervised Learning for Large Scale Noisy Data
A large amount of data is available for scientific use. Unfortunately, accurate labels of training data are usually manually-labeled and expensive, leading to insufficient labeled data to train machine learning model. We solve this problem by using generative model to resemble multiple state-of-the-art models to achieve better detection from very noisy training data.
Heyang Qin, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He, SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement, in Proceedings of the Neural Information Processing Systems 2021 (NeurIPS 2021), Virtual, December, 2021 (Acceptance rate: 2371/9122=26%). [Slides]
Heyang Qin, Syed Zawad, Yanqi Zhou, Sanjay Padhi, Lei Yang, and Feng Yan, Reinforcement Learning Empowered MLaaS Scheduling for Serving Intelligent Internet of Things, IEEE Internet of Things Journal, 2020 (Impact factor: 9.515).
Heyang Qin, Syed Zawad, Yanqi Zhou, Lei Yang, Dongfang Zhao, Feng Yan, Swift Machine Learning Model Serving Scheduling: A Region Based Reinforcement Learning Approach, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2019), Denver, CO, USA, Nov, 2019 (Acceptance rate: 78/344=22%). [Slides]
I am currently working as a researcher of the Microsoft DeepSpeed team. My work focus on improving the efficiency and scalability of large scale maching learning systems by means of system optimizations as well as CUDA kernel optimization.
I worked as a research intern of the Microsoft DeepSpeed team. My work focus on using adaptive methods to optimize large scale machine learning in terms of performance and scalability. My recent work SimiGrad achieves a record-breaking large batch size of 77k in BERT pretrain while keeping sota model performance.
Duing this period, I work on researches about reinforcement learning and deep learning as well as their applications on cloud computing.
As a teaching assistant, my job includes teaching lab sections, holding office hours and grading.
GPA: 4.00
I work as a teaching assistant on CPE 201 with Dr. Hung La and Dr. Siming Liu.
I work as a teaching assistant on ENGR 100 with Dr. Ann-Marie Vollstedt, Prof. Kelly Keselica and Dr. Adam Kirn.
Apart from being a researcher, I partcipate in multiple open-source projects where I contribute codes, translation, etc. I also play table tennis and volleyball occasionally in my free time.
For indoor time, I usually enjoy my time with video games and detective fictions. I also have strong interest and rich knowledge about archaic Chinese and Chinese ancient literature.