Two papers accepted at ICML'24

Two papers were accepted at external pageICML'24.

HexGen: Generative Inference of Large-Scale Foundation Model over Heterogeneous Decentralized Environment by Youhe Jiang, Ran Yan, Xiaozhe Yao, Yang Zhou, Beidi Chen, and Binhang Yuan

KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving by Foteini Strati, Sara McAllister, Amar Phanishayee, Jakub Tarnawski, and Ana Klimovic.

 

JavaScript has been disabled in your browser