Ease.ml/ci & ease.ml/meter
Towards Data Management for Statistical Generialization
What is ease.ml/ci & ease.ml/meter?
When training a machine learning model becomes fast, and model selection and hyper-parameter tuning become automatic, will non-CS experts finally have the tool they need to build ML applications all by themselves? We at DS3Lab focus on those users who are still struggling — not because of the speed and the lack of automation of an ML system, but because it is so powerful that it is easily misused as an overfitting machine. For many of these users, the quality of their ML applications might actually decrease with these powerful tools without proper guidelines and feedback (like what software engineering provides for traditional software development). We introduce two systems, ease.ml/ci and ease.ml/meter, which we built as an early attempt at an ML system that tries to enforce the right user behavior during the development process of ML applications. The core technical challenge is how to answer adaptive statistical queries in a rigorous but practical (in terms of label complexity) way. Interestingly, both systems can be seen as a new type of data management system which, instead of managing the (relational) querying of the data, manages the statistical generalization power of the data.
Projects and Publications
Publications
- Bojan Karlaš, Matteo Interlandi, Cedric Renggli, Wentao Wu, Ce Zhang, Deepak Mukunthu Iyappan Babu, Jordan Edwards, Chris Lauren, Andy Xu and Markus Weimer. Building Continuous Integration Services for Machine Learning. KDD 2020 (Applied Data Science, Oral Presentation 44/756).
- Cedric Renggli, Bojan Karlas, Bolin Ding, Feng Liu, Kevin Schawinski, Wentao Wu, Ce Zhang. external page Continuous Integration of Machine Learning Models: A Rigorous Yet Practical Treatment. SysML 2019. [external page Video: Youtube]
Demo
- Cedric Renggli*, Frances Ann Hubis*, Bojan Karlaš, Kevin Schawinski, Wentao Wu, Ce Zhang. external page Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization. VLDB Demo 2019.
Ease.ml/meter
Ease.ml/meter is a system that continuously returns some notion of the degree of overfitting to the developer.
Publication
- Frances Ann Hubis, Wentao Wu, Ce Zhang. external page Ease.ml/meter: Quantitative Overfitting Management for Human-in-the-loop ML Application Development. Manuscript ArXiv 1906.00299, 2019
Demo
- Cedric Renggli*, Frances Ann Hubis*, Bojan Karlaš, Kevin Schawinski, Wentao Wu, Ce Zhang. external page Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization. VLDB Demo 2019.
People
External Collaborators
- Wentao Wu (Microsoft Research)
- Bolin Ding (Alibaba)
DS3Lab Members
- Cedric Renggli
- Bojan Karlaš
- Frances Ann Hubis (previously)
- Ce Zhang