Two papers accepted at ICML'24
Two papers were accepted at external pageICML'24.
HexGen: Generative Inference of Large-Scale Foundation Model over Heterogeneous Decentralized Environment by Youhe Jiang, Ran Yan, Xiaozhe Yao, Yang Zhou, Beidi Chen, and Binhang Yuan
KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving by Foteini Strati, Sara McAllister, Amar Phanishayee, Jakub Tarnawski, and Ana Klimovic.