Fast distributed inference serving for large language models B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun, G Huang, X Liu, X Jin arXiv preprint arXiv:2305.05920, 2023 | 67 | 2023 |
Transparent {GPU} sharing in container clouds for deep learning workloads B Wu, Z Zhang, Z Bai, X Liu, X Jin 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023 | 36 | 2023 |
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation C Jin, Z Zhang, X Jiang, F Liu, X Liu, X Liu, X Jin arXiv preprint arXiv:2404.12457, 2024 | 20 | 2024 |
Ditto: Efficient serverless analytics with elastic parallelism C Jin, Z Zhang, X Xiang, S Zou, G Huang, X Liu, X Jin Proceedings of the ACM SIGCOMM 2023 Conference, 406-419, 2023 | 15 | 2023 |
{dLoRA}: Dynamically Orchestrating Requests and Adapters for {LoRA}{LLM} Serving B Wu, R Zhu, Z Zhang, P Sun, X Liu, X Jin 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | 8 | 2024 |
Fast Vector Query Processing for Large Datasets Beyond {GPU} Memory with Reordered Pipelining Z Zhang, F Liu, G Huang, X Liu, X Jin 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024 | 4 | 2024 |
Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective X Liu, D Gu, Z Chen, J Wen, Z Zhang, Y Ma, H Wang, X Jin ACM Transactions on Software Engineering and Methodology 32 (6), 1-26, 2023 | 4 | 2023 |
Optimizing half precision Winograd convolution on ARM many-core processors D Xie, Z Jia, Z Zhang, X Jin Proceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems, 53-60, 2022 | 4 | 2022 |
Disttrain: Addressing model and data heterogeneity with disaggregated training for multimodal large language models Z Zhang, Y Zhong, R Ming, H Hu, J Sun, Z Ge, Y Zhu, X Jin arXiv preprint arXiv:2408.04275, 2024 | 2 | 2024 |
Fast, Approximate Vector Queries on Very Large Unstructured Datasets Z Zhang, C Jin, L Tang, X Liu, X Jin 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023 | 2 | 2023 |
Rlhfuse: Efficient rlhf training for large language models with inter-and intra-stage fusion Y Zhong, Z Zhang, B Wu, S Liu, Y Chen, C Wan, H Hu, L Xia, R Ming, ... arXiv preprint arXiv:2409.13221, 2024 | 1 | 2024 |
Jolteon: Unleashing the Promise of Serverless for Serverless Workflows Z Zhang, C Jin, X Jin 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024 | 1 | 2024 |