Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve A Agrawal, N Kedia, A Panwar, J Mohan, N Kwatra, BS Gulavani, ... arXiv preprint arXiv:2403.02310, 2024 | 4 | 2024 |
Vidur: A Large-Scale Simulation Framework For LLM Inference A Agrawal, N Kedia, J Mohan, A Panwar, N Kwatra, B Gulavani, ... arXiv preprint arXiv:2405.05465, 2024 | | 2024 |