Paper "Deferred Continuous Batching in Resource-Efficient Large Language Model Serving" accepted at the EuroMLSys'24 workshop
The paper "Deferred Continuous Batching in Resource-Efficient Large Language Model Serving" by Yongjun He (ETH Zürich), Yao Lu (NUS), and Gustavo Alonso (ETH Zürich) was accepted at the 4th Workshop on Machine Learning and Systems (external page EuroMLSys'24), collocated with EuroSys'24, to be held in Athens, Greece on the 22 April 2024.