Yanghua Peng

Cited by

	All	Since 2019
Citations	1504	1487
h-index	13	13
i10-index	13	13

380

190

285

201820192020202120222023202415 86 208 262 307 372 251

Public access

View all

14 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Chuan WuProfessor of Computer Science, The University of Hong KongVerified email at cs.hku.hk
Yixin BaoThe University of Hong KongVerified email at cs.hku.hk
Yangrui ChenThe University of Hong KongVerified email at cs.hku.hk
Chuanxiong GuoSysnetome.comVerified email at ieee.org
Zongpeng LiTsinghua UniversityVerified email at tsinghua.edu.cn
Chang LanGoogle DeepMindVerified email at google.com
Bairen YiByteDance Inc.Verified email at connect.ust.hk
Haibin LinBytedanceVerified email at bytedance.com
Wei LinAlibabaVerified email at alibaba-inc.com
Chen MengChinese Academy of Sciences | CASVerified email at sccas.cn
Hongzhi ChenByteDanceVerified email at bytedance.com
Dan LiTsinghua UniversityVerified email at tsinghua.edu.cn
Hongzheng ChenCornell UniversityVerified email at cornell.edu
Hanpeng HuThe University of Hong KongVerified email at cs.hku.hk
Chengchen HuNIOVerified email at ieee.org
Xin JinPeking UniversityVerified email at pku.edu.cn
Xuanzhe LiuBoya Distinguished Professor of Computer Science, Peking University, ACM Distinguished ScientistVerified email at pku.edu.cn
Yihao ZhaoPeking UniversityVerified email at pku.edu.cn
Jingpu DuanPeng Cheng Laboratory, Shenzhen, ChinaVerified email at pcl.ac.cn
Alex X. LiuMichigan State UniversityVerified email at cse.msu.edu

Yanghua Peng

ByteDance Inc.

Verified email at cs.hku.hk

Deep Learning Systems GPU Scheduling


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Optimus: an efficient dynamic resource scheduler for deep learning clusters Y Peng, Y Bao, Y Chen, C Wu, C Guo Proceedings of the Thirteenth EuroSys Conference, 1-14, 2018	472	2018
A generic communication scheduler for distributed DNN training acceleration Y Peng, Y Zhu, Y Chen, Y Bao, B Yi, C Lan, C Wu, C Guo Proceedings of the 27th ACM Symposium on Operating Systems Principles, 16-29, 2019	337	2019
Deep learning-based job placement in distributed machine learning clusters Y Bao, Y Peng, C Wu IEEE INFOCOM 2019-IEEE conference on computer communications, 505-513, 2019	145	2019
Online job scheduling in distributed machine learning clusters Y Bao, Y Peng, C Wu, Z Li IEEE INFOCOM 2018-IEEE Conference on Computer Communications, 495-503, 2018	126	2018
DL2: A deep learning-driven scheduler for deep learning clusters Y Peng, Y Bao, Y Chen, C Wu, C Meng, W Lin IEEE Transactions on Parallel and Distributed Systems 32 (8), 1947-1960, 2021	81	2021
Preemptive all-reduce scheduling for expediting distributed DNN training Y Bao, Y Peng, Y Chen, C Wu IEEE INFOCOM 2020-IEEE Conference on Computer Communications, 626-635, 2020	63	2020
{BGL}:{GPU-Efficient}{GNN} training by optimizing graph data {I/O} and preprocessing T Liu, Y Chen, D Li, C Wu, Y Zhu, J He, Y Peng, H Chen, H Chen, C Guo 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023	56	2023
deTector: a Topology-aware Monitoring System for Data Center Networks Y Peng, J Yang, C Wu, C Guo, C Hu, Z Li 2017 USENIX Annual Technical Conference (USENIX ATC 17), 55-68, 2017	40	2017
Multi-resource interleaving for deep learning training Y Zhao, Y Liu, Y Peng, Y Zhu, X Liu, X Jin Proceedings of the ACM SIGCOMM 2022 Conference, 428-440, 2022	39	2022
Elastic parameter server load distribution in deep learning clusters Y Chen, Y Peng, Y Bao, C Wu, Y Zhu, C Guo Proceedings of the 11th ACM Symposium on Cloud Computing, 507-521, 2020	38	2020
Dynamic scaling of virtualized, distributed service chains: A case study of IMS J Duan, C Wu, F Le, AX Liu, Y Peng IEEE Journal on Selected Areas in Communications 35 (11), 2501-2511, 2017	37	2017
{MegaScale}: Scaling large language model training to more than 10,000 {GPUs} Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen, Z Zhang, Y Peng, X Li, C Xie, ... 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024	26	2024
Deep learning-based job placement in distributed machine learning clusters with heterogeneous workloads Y Bao, Y Peng, C Wu IEEE/ACM Transactions on Networking 31 (2), 634-647, 2022	13	2022
SP-GNN: Learning structure and position information from graphs Y Chen, J You, J He, Y Lin, Y Peng, C Wu, Y Zhu Neural Networks 161, 505-514, 2023	9	2023
dpro: A generic performance diagnosis and optimization toolkit for expediting distributed dnn training H Hu, C Jiang, Y Zhong, Y Peng, C Wu, Y Zhu, H Lin, C Guo Proceedings of Machine Learning and Systems 4, 623-637, 2022	9	2022
Sapipe: Staleness-aware pipeline for data parallel dnn training Y Chen, C Xie, M Ma, J Gu, Y Peng, H Lin, C Wu, Y Zhu Advances in Neural Information Processing Systems 35, 17981-17993, 2022	7	2022
LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization J Zhao, B Wan, Y Peng, H Lin, C Wu arXiv preprint arXiv:2403.01136, 2024	4	2024
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs H Hu, J Su, J Zhao, Y Peng, Y Zhu, H Lin, C Wu Proceedings of the Nineteenth European Conference on Computer Systems, 1054-1074, 2024	1	2024
dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training H Hu, C Jiang, Y Zhong, Y Peng, C Wu, Y Zhu, H Lin, C Guo arXiv preprint arXiv:2205.02473, 2022	1	2022
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices J Zhao, B Wan, Y Peng, H Lin, Y Zhu, C Wu 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors