Follow
Yinmin Zhong
Title
Cited by
Cited by
Year
{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving
Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X Jin, Y Huang, Z Chen, H Zhang, ...
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023
1282023
Distserve: Disaggregating prefill and decoding for goodput-optimized large language model serving
Y Zhong, S Liu, J Chen, J Hu, Y Zhu, X Liu, X Jin, H Zhang
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024
842024
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen, Z Zhang, Y Peng, X Li, C Xie, ...
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), 2024
812024
Fast distributed inference serving for large language models
B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun, G Huang, X Liu, X Jin
arXiv preprint arXiv:2305.05920, 2023
782023
ElasticFlow: An elastic serverless training platform for distributed deep learning
D Gu, Y Zhao, Y Zhong, Y Xiong, Z Han, P Cheng, F Yang, G Huang, X Jin, ...
Proceedings of the 28th ACM International Conference on Architectural …, 2023
282023
Loongserve: Efficiently serving long-context large language models with elastic sequence parallelism
B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles …, 2024
172024
Flux: Fast software-based communication overlap on gpus through kernel fusion
LW Chang, W Bao, Q Hou, C Jiang, N Zheng, Y Zhong, X Zhang, Z Song, ...
arXiv preprint arXiv:2406.06858, 2024
52024
DistMind: Efficient Resource Disaggregation for Deep Learning Workloads
X Jin, Z Bai, Z Zhang, Y Zhu, Y Zhong, X Liu
IEEE/ACM Transactions on Networking, 2024
52024
Disttrain: Addressing model and data heterogeneity with disaggregated training for multimodal large language models
Z Zhang, Y Zhong, R Ming, H Hu, J Sun, Z Ge, Y Zhu, X Jin
arXiv preprint arXiv:2408.04275, 2024
32024
Rlhfuse: Efficient rlhf training for large language models with inter-and intra-stage fusion
Y Zhong, Z Zhang, B Wu, S Liu, Y Chen, C Wan, H Hu, L Xia, R Ming, ...
arXiv preprint arXiv:2409.13221, 2024
12024
Aquifer: Transparent Microsecond-scale Scheduling for vRAN Workloads
Y Jia, Y Zhong, M Wang, J Gao, P Zhang, X Liu, X Jin
IEEE Transactions on Services Computing, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–11