Systems for Efficient AI

Training and serving AI models requires large compute clusters, which have high cost and consuming significant energy. As models continue to grow in size to improve accuracy, scaling the underlying system infrastructure is challenging due to high cost and datacenter power limitations. Improving the resource efficiency and data efficiency of AI model training and serving is key to making AI more sustainable, scalable, and ubiquitous.

Research topics: How can we improve AI serving latency and throughput per Watt, particularly as AI inference increasingly involves models interacting with databases (e.g., for retrieval augmented generation) and a variety of other tools (e.g., search engines and code interpreters)? How can we efficiently customize model inference for different users and types of requests in a multi-tenant setting while exposing an intuitive, declarative API to users? How can we make AI training more resource-efficient and elastic? How should we build distributed storage services to manage AI training data and enable efficient data selection and ingestion during training?