FPGA 2023: Enabling Networking for Distributed Applications on FPGA Clusters
FPGAs are increasingly being deployed in data centers and the cloud in a variety of settings and configurations (e.g., Microsoft Catapult). Such rapid cloud development makes FPGAs no longer viewed as a PCIe-attached accelerator, but as a first-class compute resource directly connected to the network. This opens up lots of opportunities for in-network processing and distributed computing on FPGAs. However, there are limited open-source resources to enable research in this exciting space, and therefore, not enough attention is being paid to using clusters of FPGAs to tackle larger problems and to support large-scale deployments. In this tutorial, we will present and illustrate with examples how to use several resources available to the academic research community to pursue research in distributed applications on top of FPGA clusters.
Location
The tutorial is held on 12th Feb 2023, as part of the conference external page FPGA'23. All the slides will be made available shortly before the tutorial and all the recording will be released after the tutorial.
Schedule
Infrastucture
08:30 – 09:00 Zhenhao He, HACC Cluster Introduction
[Download slides (PDF, 1.5 MB)] [Download recording (MP4, 77.1 MB)]
09:00 – 09:20 Lucian Petrica, VNX: UDP Support for Vitis
[Download slides (PDF, 853 KB)] [Download recording (MP4, 125.7 MB)] [external page github]
09:20 – 09:40 Zhenhao He, EasyNet: TCP Support for Vitis
[Download slides (PDF, 915 KB)] [Download recording (MP4, 75.4 MB)] [external page github]
09:40 – 10:10 Lucian Petrica, ACCL: MPI Collective Offload Engine
[Download slides (PDF, 1012 KB)] [Download recording (MP4, 171.4 MB)] [external page github]
10:10 – 10:40 Dario Korolija, Coyote: FPGA Shell and RDMA
[Download slides (PDF, 5.5 MB)] [Download recording (MP4, 71.6 MB)] [external page github]
10:40 – 11:00 Coffee Break
Demo
11:00 – 11:20 Zhenhao He, EasyNet and ACCL Microbenchmark on HACC
[Download slides (PDF, 376 KB)] [Download recording (MP4, 68.3 MB)]
11:20 – 11:30 Dario Korolija Coyote Microbenchmark on HACC
11:30 – 12:00 Lucian Petrica, Running ACCL applications in Simulator and Emulator
[Download slides (PDF, 971 KB)]
Applications
12:00 – 12:30 Zhenhao He, Distributed Recommendation Inference
[Download slides (PDF, 2.7 MB)] [Download recording (MP4, 31.9 MB)]
Zhenhao He is a PhD student in ETH Zurich. His research focuses on on building networking infrastructure for distributed FPGA applications. He is also interested in designing customized hardware accelerators for computationally intensive tasks, such as machine learning.
Dario Korolija is a PhD student at ETH Zurich. He works mostly at the intersection between software and hardware. His main research area is on creating novel abstractions for modern heterogeneous architectures. He usually spends most of the time hacking away in the fields of computer architecture, operating systems and networking. He is also interested in recent compiler advancements for these novel computing systems.
Lucian Petrica is a senior researcher in the AMD AECG Research Labs in Dublin, Ireland. He received a PhD in computer engineering from the Politehnica University of Bucharest and has had research roles at TU Delft, Ixia, Xilinx, and now AMD. His research interests center on FPGA technology and applications, more specifically dataflow DNN acceleration and distributed FPGA computation. He has been involved in the FINN dataflow inference accelerator compiler, and is now a lead developer for ACCL, an MPI-like collective communication library for datacenter FPGAs.