Data Management Systems

Lecturer: Gustavo Alonso

Teaching Assistants

  • Simon Kassing
  • Dario Korolija
  • Dimitris Koutsoukos
  • Fabio Maschi
  • Michal Wawrzoniak

Lectures

  • Wednesday 10:00 - 12:00 CAB G61
  • Friday 8:00 - 9:00 HG G3

Exercises

  • Friday 9:00 - 10:00 HG E 21
  • Friday 9:00 - 10:00 Online
    Zoom link for online group can be found on the course Moodle page.

Contact

Please use the protected page Moodle Q&A forum to ask questions outside of lectures and exercise sessions. If you have private questions for the instructors or TAs, please send an email to

Announcements and exercises will be handled through Moodle.

Course contents

The course will cover the implementation aspects of data management systems using relational database engines as a starting point to cover the basic concepts of efficient data processing and then expanding those concepts to modern implementations in data centers and the cloud.

The goal of the course is to convey the fundamental aspects of efficient data management from a systems implementation perspective: storage, access, organization, indexing, consistency, concurrency, transactions, distribution, query compilation vs interpretation, data representations, etc. Using conventional relational engines as a starting point, the course will aim at providing an in depth coverage of the latest technologies used in data centers and the cloud to implement large scale data processing in various forms.

The course will first cover fundamental concepts in data management: storage, locality, query optimization, declarative interfaces, concurrency control and recovery, buffer managers, management of the memory hierarchy, presenting them in a system independent manner. The course will place an special emphasis on understating these basic principles as they are key to understanding what problems existing systems try to address. It will then proceed to explore their implementation in modern relational engines supporting SQL to then expand the range of systems used in the cloud: key value stores, geo-replication, query as a service, serverless, large scale analytics engines, etc.

The main source of information for the course will be articles and research papers describing the architecture of the systems discussed. The list of papers will be provided as the materials for each chapter of the course are released.

Due to the uncertainties created by the Corona virus and the possibility that access to ETH, laboratories, classrooms, etc. might be restricted in the middle of the semester, this edition of the course will have no project or practical component. We will focus on the key architectural aspects and surveying the literature on data management systems architecture. The time that otherwise would have been devoted to programming will be invested instead in looking deeper at how systems are constructed and the algorithms behind many of the optimizations used in real systems. Freed from development work, students are expected to invest the necessary time reading the provided articles and books to gain the necessary understanding of the material.

Syllabus

Lecture schedule

Lecture slides are available via the course Moodle site.

Reading assignments

Teaching format

  • Lectures will be recorded and streamed live (slides and voice).
  • Homework will be handled through Moodle.
  • Exam will be handled through Moodle.
JavaScript has been disabled in your browser