Advanced Topics in Database Systems

Download the
Course Poster

Course Objectives
Database systems today are more ubiquitous than ever. With the explosion of information, new requirements for fast and reliable data management have emerged and database systems support the backend of most imternat applications. Managing terabytes of information is now routine, petabytes a frequent necessity, and exabytes around the corner. On top of that, database systems now operate on hardware that has intelligence on its own, makes decisions and predicts what the software will do. All of the above open new horizons for research on database system design and performance; this course covers system and performance issues in today's database system design. Topics include query processing and optimization, concurrency control, smart and efficient benchmarking, modern query processing algorithms for internet applications, interaction between the database software and the underlying hardware, and other topics related to database system performance. The course is intended for students who (a) have taken database courses before and want to know what is new and sizzling in the database community; or (b) are looking for a Ph.D. topic; or (3) are already involved in a project that needs database expertise, and want to learn more about it. The incarnation of the course each semester it is offered will differ than that of other semesters in (a) the material, which will be adapted to new topics and (b) the structure, which will be adapted according to the students' needs and the instructor's experience.

Content
In the Spring of 2010 offering of the course we will read and discuss papers from two major areas of data management systems research:

  1. Database performance and scalability issues on modern hardware platforms. We will discuss query processing algorithms for deep memory hierarchies and aggressively parallel systems based on multicore chips, as well as storage and the new, popular flash devices.
  2. Data management support for scientific applications. Except for their sheer size, scientific dataset pose several new challenges to data management. We will discuss recent research on (a) methods for automation of physical database design, i.e., schema and indexes, as well as (b) incorporation of physical models and specialized data structures (such as tetrahedral meshes) as first-class citizens into the database.
The tentative list of topics is as follows:

We are of course also open to suggestions from course participants. The course will consist from papers which we will all read and have presented in the class, either by the instructor or by the students.

There will be a list of project ideas given out, but students can suggest and work on their own ideas with potentials for advancing the state of the art.

Required prior knowledge
basic database management systems (undergraduate level, e.g. B-trees, joins, sorting, etc.), C/C++ (depending on the choice of project), data structures and algorithms

Course URL
Updates to the program and all course material are posted on the web page of the course at
http://moodle.epfl.ch/course/view.php?id=4541.
(much more general) EDIC entry of the course description:
Advanced Topics in Database Systems [en]

Keywords
database management systems, chip multiprocessors, cache hierarchies, flash disks, scientific databases, automated database design