SIGMOD'23 Tutorial: Data Processing on FPGAs with Modern Architectures
Abstract
Trends in hardware, the prevalence of the cloud, and the rise of highly demanding applications have ushered an era of specialization that is quickly changing the way data is processed at scale. These changes are likely to continue and accelerate in the next years as new technologies are adopted and deployed: smart NICs, smart storage, smart memory, disaggregated storage, disaggregated memory, specialized accelerators (GPUS, TPUs, FPGAs), as well as a wealth of ASICs specifically created to deal with computationally expensive tasks (e.g., cryptography or compression). In this tutorial, we focus on data processing on FPGAs, a technology that has received less attention than, e.g., TPUs or GPUs but that is, however, increasingly being deployed in the cloud for data processing tasks due to the architectural flexibility of FPGAs, along with their ability to process data at line rate, something not possible with other types of processors or accelerators.
In the tutorial, we will cover what FPGAs are, their characteristics, their advantages and disadvantages over other design options, as well as examples from deployments in the industry and how they are used in a variety of data processing tasks. Then we will provide a brief introduction to FPGA programming with High-Level Synthesis (HLS) tools as well as briefly describe resources available to researchers in the form of academic clusters and open-source systems that simplify the first steps. The tutorial will also include several case studies borrowed from research done in collaboration with companies that illustrate both the potential of FPGAs in data processing but also how software and hardware architectures are evolving to take advantage of the possibilities offered by FPGAs. These use cases include: (1) Approximated Nearest Neighbor Search (ANNS), to illustrate the problem of searching and processing large vector data collections, a problem relevant in both traditional data management and machine learning, (2) remote disaggregated memory, showing how the cloud architecture is evolving and demonstrating the potential for operator offloading and line rate data processing, and (3) recommendation systems which stand in for applications with very tight latency constraints that must, nevertheless, process vast amounts of data to provide results of sufficiently high quality.
Slides
- Download vertical_align_bottom Introduction: Data Processing with FPGAs on Modern Architectures (PDF, 1.1 MB)
- Download vertical_align_bottom Programming FPGAs: a Software Programmer’s Perspective (PDF, 3.4 MB)
- Download vertical_align_bottom The HACC FPGA Cluster (PDF, 9 MB)
- Download vertical_align_bottom Farview:Disaggregated Memory with Operator Off-loading for Database Engines (PDF, 31.4 MB)
- Download vertical_align_bottom Co-design Hardware and Algorithm for Vector Search (PDF, 6.4 MB)
- Download vertical_align_bottom Efficient Recommendation Inference on Heterogeneous CPU, GPU, FPGA Clusters (PDF, 3.2 MB)
References
Data Processing with FPGAs on Modern Architectures. Jiang, Wenqi, Dario Korolija, and Gustavo Alonso. Companion of the 2023 International Conference on Management of Data. 2023.
Farview: Disaggregated memory with operator off-loading for database engines. Korolija, D., Koutsoukos, D., Keeton, K., Taranov, K., Milojičić, D. and Alonso, G., CIDR 2022.
Co-design Hardware and Algorithm for Vector Search. Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, and Gustavo Alonso. The International Conference for High Performance Computing, Networking, Storage and Analysis, 2023
Fleetrec: Large-scale recommendation inference on hybrid gpu-fpga clusters. Jiang, W., He, Z., Zhang, S., Zeng, K., Feng, L., Zhang, J., Liu, T., Li, Y., Zhou, J., Zhang, C. and Alonso, G., 2021, August. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.
MicroRec: efficient recommendation inference by hardware and data structure solutions. Jiang, W., He, Z., Zhang, S., Preußer, T.B., Zeng, K., Feng, L., Zhang, J., Liu, T., Li, Y., Zhou, J. and Zhang, C., 2021. Proceedings of Machine Learning and Systems.
Presenters
Gustavo Alonso is a professor at the Systems Group of the Department of Computer Science at ETH Zürich. His research interests include data management, distributed systems, cloud computing architecture, and hardware acceleration through reconfigurable computing. Gustavo has served as PC chair for conferences in several areas including VLDB, ICDE, EDBT, EuroSys, Middleware, and ICDCS and regularly serves in the Program Committee of CIDR, VLDB, SIGMOD, FPGA, ATC, EuroSys, OSDI, and MLSys. He was a member of the VLDB Endowment and the EDBT Executive Board and the Chair of EuroSys, the European Chapter of ACM SIGOPS. Gustavo has received 4 Test-of-Time Awards for his research in databases, software runtimes, middleware, and mobile computing. He is an ACM Fellow, an IEEE Fellow, and a Distinguished Alumnus of the Department of Computer Science of UC Santa Barbara. Web page: https://people.inf.ethz.ch/alonso/
Dario Korolija is a final-year doctoral student at the Systems Group of the Department of Computer Science at ETH Zürich. He obtained his MSc degree from EPFL and completed his undergraduate studies at the University of Belgrade in Serbia. Switzerland. He works at the intersection between software and hardware. His main research area is on creating novel abstractions for modern heterogeneous architectures working in the fields of computer architecture, data processing, operating systems and networking (mostly RDMA). He is also interested in recent compiler advancements (MLIR) and their usage for these novel computing systems. Dario has published at conferences such as CIDR, OSDI, ASPLOS, FPL, and FPGA. Web site: https://d-kor.github.io/
Wenqi Jiang is a third-year doctoral student at the Systems Group of the Department of Computer Science at ETH Zürich. He received his M.S. degree from Columbia University and B.Eng. from Huazhong University of Science and Technology, both with honors. His research interests include computer architecture, data management, and machine learning. More specifically, he is interested in enabling large-scale vector retrieval (approximate nearest neighbor search) by cross-stack solutions: from efficient retrieval algorithms to distributed systems design and high-performance hardware support. He has also explored hardware-accelerated solutions for industry-scale recommender systems in collaboration with Alibaba. Wenqi has published at conferences such as KDD, MLSys, and FPL. Web page: https://wenqijiang.github.io/