Pathways: Asynchronous Distributed Dataflow for ML
Abstract:
Deep learning has seen remarkable achievements over the last decade, across domains from image understanding to natural language processing. This rapid recent progress of machine learning (ML) has been characterized by the co-evolution of ML models, accelerator hardware, and the software systems that tie the two together. This co-evolution poses a danger that systems become over-specialized to current workloads and fail to anticipate future needs.
In this talk, I will first discuss how researchers have started to run into the limits of expressivity and computational efficiency with the current generation of systems and hardware. I will then present Pathways, a new large scale orchestration layer for accelerators that is explicitly designed to enable exploration of new systems and ML research ideas, while retaining state of the art performance for current models. Pathways uses a sharded dataflow graph of asynchronous operators that consume and produce futures, and efficiently gang-schedules heterogeneous parallel computations on thousands of accelerators while coordinating data transfers over their dedicated interconnects. Using a novel asynchronous distributed dataflow design that lets the control plane execute in parallel, and careful engineering, Pathways enables a single-controller model that makes it easier to express complex new parallelism patterns while also allowing virtualization of accelerator resources.
Short Bio:
Sudip Roy is a technology leader at Cohere AI where he is responsible for building and managing a state-of-the-art platform for serving large language models. Over the years, he has worked on a range of projects spanning each stage of end-to-end ML lifecycle spanning systems for managing and processing large volumes of data to infrastructure for the training and serving the next generation of ML models (like PALM and PaLI) to serving large language models efficiently and reliably. Sudip is an accomplished researcher with many publications in top conferences (including MLSys, SIGMOD, VLDB, SIGKDD), has served on program committees of many of them, and has received outstanding paper awards from SIGMOD and MLSys.