Computing Platforms

Content

The seminar will cover core concepts and ideas in the general area of computer systems, ranging from software and hardware architectures to system design for operating systems, data processing systems, and distributed systems. The focus will be on fundamental ideas that apply across systems and application areas but with an emphasis on those ideas that apply to cloud platforms and hardware accelerators.

Format

The seminar will consist on student presentations based on a list of papers that will be provided at the beginning of the course. Presentations will be done in teams. Presentations will be arranged in slots of 30 minutes talk plus 15 minutes questions. Grades will be assigned based on quality of the presentation, coverage of the topic including material not in the original papers, participation during the seminar, and ability to understand, present, and criticize the underlying technology.

Seminar Hours

Mondays, 4-6pm, at CHN D 44. The first seminar will be on February 20th.

Lecturer

  • Prof. Gustavo Alonso

Teaching Assistants

  • Dr. Michael Giardino
  • Dr. Michal Friedman

Schedule

Papers

You may need to click on the links from within the ETH network (via VPN) to get the full-text papers.

1. Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, Matei Zaharia (2010) A View of Cloud Computing. In: CACM. [external page Link]

2. Neil C. Thompson, Svenja Spanuth (2021) The Decline of Computers as a General Purpose Technology. In: CACM. [external page Link]

3. Barroso, L., Marty, M., Patterson, D., & Ranganathan, P. (2017). Attack of the killer microseconds. In: CACM. [external page Link]

4. Primorac, M., Bugnion, E., & Argyraki, K. (2017). How to measure the killer microsecond. In: CCR. [external page Link]

5. Delimitrou, C., & Kozyrakis, C. (2018). Amdahl’s law for tail latency: Queueing theoretic models can guide design trade-​offs in systems targeting tail latency, not just average performance. In: CACM. [external page Link]

6. Shafer, J., Rixner, S., & Cox, A. L. (2010). The Hadoop distributed file system: Balancing portability and performance. In: ISPASS. [external page Link1][external page Link2]

7. Burrows, M. (2006). The Chubby lock service for loosely-​coupled distributed systems. In: OSDI. [external page Link]

8. Hunt, P., Konar, M., Junqueira, F. P., & Reed, B. (2010). ZooKeeper: Wait-​free coordination for internet-​scale systems. In: USENIX ATC. [external page Link]

9. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, et al. (2007). Dynamo: Amazon’s Highly Available Key-​value Store. In: SIGOPS. [external page Link]

10. Lakshman, A., & Malik, P. (2010). Cassandra: a decentralized structured storage system. In: SIGOPS Review. [external page Link1][external page Link2]

11. Beaver, D., Kumar, S., Li, H. C., Sobel, J., & Vajgel, P. (2010). Finding a needle in haystack: Facebook's photo storage. In: OSDI. [external page Link]

12. Corbett, J. C., Dean, J., Epstein, M., et al. (2012). Spanner: Google’s Globally-​Distributed Database. In: OSDI. [external page Link]

13. Bacon, D. F., Bales, N., Bruno, N., et al. (2017). Spanner: Becoming a SQL system. In: SIGMOD. [external page Link]

14. Armbrust, M., Ghodsi, A., Zaharia, M., et al. (2015). Spark SQL: Relational Data Processing in Spark. In: SIGMOD. [external page Link]

15. Chen, G. J., Wiener, J. L., Iyer, S., Jaiswa, et al. (2016). Realtime Data Processing at Facebook. In: SIGMOD. [external page Link]

16. Verbitski et al (2017) Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In: SIGMOD '17. [external page Link]

17. Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, Joao Carreira, Neeraja J. Yadwadkar, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica, David A. Patterson. What Serverless Computing Is and Should Become: The Next Phase of Cloud Computing. In: CACM. [external page Link]

18. Hellerstein, J. M., Faleiro, J., Gonzalez, et al. (2019). Serverless Computing: One Step Forward, Two Steps Back. In: CIDR. [external page Link]

19. Shankar, V., Krauth, K., Vodrahalli, K., Pu, Q., et al. (2020). Serverless linear algebra. In: SoCC. [external page Link]

20. Müller, I., Marroquín, R., & Alonso, G. (2020). Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure. In: SIGMOD. [external page Link]

21. Klimovic, A., Wang, Y., Stuedi, P., et al. (2018). Pocket: Elastic Ephemeral Storage for Serverless Analytics. In: OSDI. [external page Link]

22. Ao Wang, Jingyuan Zhang, Xiaolong Ma, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, Yue Cheng
(2020). INFINICACHE: exploiting ephemeral serverless functions to build a cost-effective memory cache. In FAST'20. [external page Link]

 

Invited Talks

Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms

Abstract: The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasingly required due to the speed at which algorithms and hardware evolve. To address this limitation, we present Modularis, an execution layer for data analytics based on sub-operators, i.e., composable building blocks resembling traditional database operators but at a finer granularity. To demonstrate the feasibility and advantages of our approach, we use Modularis to build a distributed query processing system supporting relational queries running on an RDMA cluster, a serverless cloud platform, and a smart storage engine. Modularis requires minimal code changes to execute queries across these three diverse hardware platforms, showing that the sub-operator approach reduces the amount and complexity of the code to maintain. In fact, changes in the platform affect only those sub-operators that depend on the underlying hardware (in our use cases, mainly the sub-operators related to network communication). We show the end-to-end performance of Modularis by comparing it with a framework for SQL processing (Presto), a commercial cluster database (SingleStore), as well as Query-as-a-Service systems (Athena, BigQuery). Modularis outperforms all these systems, proving that the design and architectural advantages of a modular design can be achieved without degrading performance. We also compare Modularis with a hand-optimized implementation of a join for RDMA clusters. We show that Modularis has the advantage of being easily extensible to a wider range of join variants and group by queries, all of which are not supported in the hand-tuned join.

Serverless Datacenter Applications

Abstract: Serverless computing offerings such as Function-as-a-Service (FaaS) platforms provide high elasticity and simplify resource management. They address key shortcomings of conventional cloud deployments over virtual machines or containers: long start-up times, coarse granularity billing, and overprovisioned deployments to absorb load spikes and/or node failures. However, today’s FaaS platforms bundle their efficient resource management with both an event-based programming model and a constrained execution model. As a result, conventional data center applications do not run on off-the-shelf FaaS. In this paper, we show that, by decoupling the fine-grained resource allocation from the restricted programming model, general data center applications with sporadic and/or bursty request patterns can benefit from the high elasticity of FaaS without requiring changes to application code. We propose Boxer, a data center overlay system providing fine-grained elasticity to generic data center applications. Our experiments demonstrate (1) the ability of Boxer to support a wide range of applications (through deployments of the DeathStar benchmark, Zookeeper, and Apache Drill), (2) its efficiency in addressing slow start times and overprovisioning, and (3) the minimal set of features needed to turn serverless into a general purpose, instantly deployable, short-lived datacenter.

Presentations Tips

 

JavaScript has been disabled in your browser