Advanced Computer Architecture (5MD00)

Course year: 2012-2013 (1th semester, 2nd quater)

Code : 5MD00 (3 ECTS)  /  5Z033 (4 ECTS)
Lecturers : Prof. dr. Henk Corporaal
Prof. dr. ir. R.H.J.M. Otten
Email : H.Corporaal at
Phone : TU/e: +31-40-247 5195 or 3653 (secr. TU/e) / 5462 (office TU/e)
Assistance: MSc. Yifan He: y.he at
MSc. Dongrui She: d.she at
Dr. ir. Sander Stuijk (for computer infrastructure): s.stuijk at
Prerequisets: Course in computer architecture and processor design; e.g. Processor Design 5Z032 or Computation 5JJ70;
Programming experience in C, C++, or equivalent language


Information on the course:

Description and objectives

Studying the architecture, organization and use of the newest general purpose (micro)processors currently on the market, and the latest research developments in computer architecture. Architectures exploiting instruction-level parallelism (ILP), thread-level and task-level parallelism are treated. Starting from basic architecture concepts we will end with discussing the latest commercial processors (e.g., Pentium 4 multi-core, EPIC processors like Itanium, and embedded processors such as the TriMedia), and academic processors (like TRIPS).
This course also treats how processors can be combined in a multiprocessing platform, e.g. by using a Network-on-Chip. Interprocessor communication issues will be dealt with. Furthermore new code generation techniques needed for exploiting ILP will be treated. Special emphasis will be on quantifying design decisions in terms of performance and cost.

The intention of the course is to give students the ability to understand the design principles and operation of new (multi-)processor architectures, and evaluate them both qualitatively and quantitatively. Although we treat several examples, the emphasis will be on architecture concepts. Furthermore, the aim is to design, implement and test a Network-on-Chip, by one or more student teams.


Basic principles (like instruction set design), pipelining and its consequences; VLIW (very long instruction word) architectures, Superpipelined, Superscalar, SIMD (single instruction, multiple data, used in vector and sub-wordparallel processors) and MIMD (multiple instruction, multiple data) architectures; SMT (Simultaneous Multi-Threading); Out-of-order and speculative execution; Branch prediction; Data (value) prediction; Design of advanced memory hierarchies; Memory coherency and consistency; Multi-threading; Exploiting task-level and instruction-level parallelism; Inter-processor communication models; Input and output; Network Communication Architecture; and Networks-on-Chip.

Book and Handouts

Computer Architecture: A Quantitative Approach; 5th ed.

John L. Hennessy and David A. Patterson
Morgan Kaufmann Publishers
ISBN  9780123704900

Handouts (for slides see below):


Preliminary set of treated topics.
** will be added and updated during the course period **


As part of this course you have to perform 2 lab exercises, one on the design of a single, superscalar, processor (using Simple Scalar tool), and one on Multi-processor design. For each lab exercise a report has to be sent in, explaining the results obtained. Furthermore you have to perform a self-study on a topic relevant and stongly related to the material discussed in this course.
Details will follow.

Lab exercise 1: Single General-Purpose Processors

See Lab-1 Introduction slides.
In this Lab you will learn (almost) everything about Superscalar processors, i.e. Out-of-Order, Multi-Issue processors, including a complete memory hierarchy, containing several levels of caching.
For this you will use the highly configurable SimpleScalar simulation platform. The SimpleScalar tool set is a system software infrastructure used to perform program performance analysis, detailed microarchitectural modeling, and hardware-software co-verification. Using the SimpleScalar tools, you can simulate real programs running on a range of modern processors and systems.  The tool set includes sample simulators ranging from a fast functional simulator to a detailed, dynamically scheduled processor model that supports non-blocking caches, speculative execution, and state-of-the-art branch prediction.
It can emulate e.g. ARM, Alpha and x86 instruction sets.

Follow carefully the instruction which can be found here. The instructions describe how to run the tools on our server, which you can access remotely. It is also possible to use SimpleScalar directly on your own Desktop or Laptop (under UNIX/Linux or Windows NT).

Lab exercise 2: Multi-Processors

The state-of-the-art CPUs contain dozens of cores on a single die. The trend of going multicore posts new challenges to both computer architects and programmers. In this assignment, we will try to tackle these challenges, from the view point of both computer architects and programmers. The purpose of this assignment is to
  1. Get an in-depth understanding of mainstream multi-core CPU architectures.
  2. Learn how to develop parallel application on such architectures, and how to analyze the performance in a real environment.

With the help of Gem5 full system simulator and McPAT modeling framework, we will look at different configurations, e.g., the number of processors, block-size and associativity of different levels of caches. The goal is to optimize the Energy-Delay-Area-Product (EDAP) of the system. You can achieve this goal by improving the original C code, using OpenMP, and/or using any other creative methods. Details about the assignment can be found here.

You have to send in a lab report about your results to Yifan He and Dongrui She

Self-study of relevant topic

Guidelines are as follows:


The examination will likely be oral.
The grade is based on your project results (and being able to explain and defend them) and the discussed theory (study all slides, book chapters and handouts).

Related material and other links

Back to homepage of Henk Corporaal