Paper on data managing for LLM training accepted on HotInfra'24
The paper "Decluttering the data mess in LLM training" by Maximilian Böther, Dan Graur, Xiaozhe Yao, and Ana Klimovic has been accepted for oral presentation at the Workshop on Hot Topics in System Infrastructure (external page HotInfra) at SOSP'24 in Austin, Texas. The paper lies out 3 challenges for managing and mixing data collections in the context of LLM training. The work has also been accepted for presentation as a poster at the SOSP main conference poster session.