Heterogeneous Systems Seminar


Overview:

The seminar covers heterogeneous systems, those that make use of different types of computing (GPUs, FPGA, ASICs, etc.) and/or memory (NVM/SCM). Our focus will be the systems and architectures that use these devices. The objective of this course is to familiarize students with important topics in heterogeneous systems, past, present, and future: the devices, the architectures, and their uses.

Format:

The seminar consists of student presentations of papers selected from a provided list. Depending on the number of students enrolled, the presentations will be done individually or in teams of two. Students will be allotted a 45 minute time slot consisting of a 30 minute presentation and 15 minutes for questions.

Grading:

Grading is based upon the quality of the presentation, the coverage of the paper including necessary background and follow-on work, and the ability to understand and critique the paper and technology. Because discussion is an integral part of the seminar format, students are allowed only one unexcused absence during the course of the semester.

Hours:

The Spring 2024 seminar is on Tuesday between 16:15-18:00.

Currently, it will occur in LFW C4.  

Papers:

Nonvolatile Memory

Disaggregated Memory

  • Huaicheng Li, et al. "Pond: CXL-Based Memory Pooling Systems for Cloud Platforms". In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 574–587. external page https://doi.org/10.1145/3575693.3578835
    Zhiyuan Guo et al. "Clio: a hardware-software co-designed disaggregated memory system". In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '22). Association for Computing Machinery, New York, NY, USA, 417–433. external page https://doi.org/10.1145/3503222.3507762
  • Hasan Al Maruf, et al. "TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory". In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 742–755. external page https://doi.org/10.1145/3582016.3582063
    In-Memory Processing/Near-memory processing
  • Pati et al. "T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives" to appear in ASPLOS 2024 external page https://arxiv.org/abs/2401.16677

GPUs:

  • John Nickolls et al. "Scalable Parallel Programming with CUDA: Is CUDA the Parallel Programming Model That Application Developers Have Been Waiting For?" In: Queue 6.2 (Mar. 2008), pp. 40{53. issn: 1542-7730. doi: 10.1145/1365490.1365500. url: external page https://doi.org/10.1145/1365490.1365500
  • Victor W. Lee et al. "Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU". In: Proceedings of the 37th Annual International Symposium on Computer Architecture. ISCA '10. Saint-Malo, France: Association for Computing Machinery, 2010, url: external page https://doi.org/10.1145/1815961.1816021
  • Lin Shi et al. "vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines". In: IEEE Transactions on Computers 61.6 (2012), url: external page https://ieeexplore.ieee.org/document/5928326
  • Anil Shanbhag, Samuel Madden, and Xiangyao Yu. "A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics". In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. SIGMOD '20. Portland, OR, USA: Association for Computing Machinery, 2020 url: external page https://doi.org/10.1145/3318464.3380595

FPGAs:

Analog Computing:

In-Network Computing:

Custom Accelerators:

  • Song Han et al. "EIE: Effcient Inference Engine on Compressed Deep Neural Network". In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 2016, pp. 243-254. doi: https://ieeexplore.ieee.org/document/7551397
  • Norman P. Jouppi et al. "In-Datacenter Performance Analysis of a Tensor Processing Unit". In: SIGARCH Comput. Archit. News 45.2 (June 2017), url: https://doi.org/10.1145/3140659.3080246
  • Norm Jouppi, et al. "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings". In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA '23). Association for Computing Machinery, New York, NY, USA, Article 82, 1–14. external page https://doi.org/10.1145/3579371.3589350
  • Yatish Turakhia, Gill Bejerano, and William J. Dally. "Darwin: A Genomics Co-Processor Provides up to 15,000X Acceleration on Long Read Assembly". In: SIGPLAN Not. 53.2 (Mar. 2018), url: external page https://doi.org/10.1145/3296957.3173193
  • Parthasarathy Ranganathan et al. \Warehouse-Scale Video Acceleration: Co-Design and Deployment in the Wild". In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS 2021. Virtual, USA: Association for Computing Machinery, 2021, url: external page https://doi.org/10.1145/3445814.3446723
  • Yakun Sophia Shao, et al. "Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture". In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, New York, NY, USA, 14–27. external page https://doi.org/10.1145/3352460.3358302

Wild Card

Contact

Dr. Michael Joseph Giardino
Lecturer at the Department of Computer Science
  • STF H 319
  • vCard Download

Institut für Computing Platforms
Stampfenbachstrasse 114
8092 Zürich
Switzerland

JavaScript has been disabled in your browser