Paper "Deferred Continuous Batching in Resource-Efficient Large Language Model Serving" accepted at the EuroMLSys'24 workshop

The paper "Deferred Continuous Batching in Resource-Efficient Large Language Model Serving" by Yongjun He (ETH Zürich), Yao Lu (NUS), and Gustavo Alonso (ETH Zürich) was accepted at the 4th Workshop on Machine Learning and Systems (external pageEuroMLSys'24), collocated with EuroSys'24, to be held in Athens, Greece on the 22 April 2024.

JavaScript has been disabled in your browser