Advanced Computer Architecture

Course year: 2009-2010

Code : 5MD00 (3 ECTS)  /  5Z033 (4 ECTS)
Lecturers : Prof. dr. Henk Corporaal
Prof. dr. ir. R.H.J.M. Otten
Email : H.Corporaal at tue.nl
Phone : TU/e: +31-40-247 5195 or 3653 (secr. TU/e) / 5462 (office TU/e)
Assistance: Dr. ir. Sander Stuijk (miniMIPS models + FPGA synthesis): s.stuijk at tue.nl
Ir. Akash Kumar (FPGA synthesis, SiHive lab): a.kumar at tue.nl
MSc Yifan He: y.he at tue.nl
Prerequisets: Course in computer architecture and processor design; e.g. Processor Design 5Z032 or Computation 5JJ70;
Programming experience in C, C++, or equivalent language

News

Information on the course:

Description and objectives

Studying the architecture, organization and use of the newest general purpose (micro)processors currently on the market, and the latest research developments in computer architecture. Architectures exploiting instruction-level parallelism (ILP), thread-level and task-level parallelism are treated. Starting from basic architecture concepts we will end with discussing the latest commercial processors (e.g., Pentium 4 multi-core, EPIC processors like Itanium, and embedded processors such as the TriMedia), and academic processors (like TRIPS).
This course also treats how processors can be combined in a multiprocessing platform, e.g. by using a Network-on-Chip. Interprocessor communication issues will be dealt with. Furthermore new code generation techniques needed for exploiting ILP will be treated. Special emphasis will be on quantifying design decisions in terms of performance and cost.

The intention of the course is to give students the ability to understand the design principles and operation of new (multi-)processor architectures, and evaluate them both qualitatively and quantitatively. Although we treat several examples, the emphasis will be on architecture concepts. Furthermore, the aim is to design, implement and test a Network-on-Chip, by one or more student teams.

Topics:

Basic principles (like instruction set design), pipelining and its consequences; VLIW (very long instruction word) architectures, Superpipelined, Superscalar, SIMD (single instruction, multiple data, used in vector and sub-wordparallel processors) and MIMD (multiple instruction, multiple data) architectures; SMT (Simultaneous Multi-Threading); Out-of-order and speculative execution; Branch prediction; Data (value) prediction; Design of advanced memory hierarchies; Memory coherency and consistency; Multi-threading; Exploiting task-level and instruction-level parallelism; Inter-processor communication models; Input and output; Network Communication Architecture; and Networks-on-Chip.

Book and Handouts

Computer Architecture: A Quantitative Approach; 4th ed.

John L. Hennessy and David A. Patterson
Morgan Kaufmann Publishers
ISBN  9780123704900

Handouts (for slides see below):

Slides

** will be added and updated during the course period **

Project

Part of the course will be project based.

Lab exercise 1: Single General-Purpose Processors

This exercise makes you familiar with the highly configurable SimpleScalar simulation platform. The SimpleScalar tool set is a system software infrastructure used to build modeling applications for program performance analysis, detailed microarchitectural modeling, and hardware-software co-verification. Using the SimpleScalar tools, users can build modeling applications that simulate real programs running on a range of modern processors and systems.  The tool set includes sample simulators ranging from a fast functional simulator to a detailed, dynamically scheduled processor model that supports non-blocking caches, speculative execution, and state-of-the-art branch prediction.
It can emulate e.g. ARM, Alpha and x86 instruction sets.

Follow carefully the instruction which can be found here. The instructions describe how to run the tools on our server, which you can access remotely. It is also possible to use SimpleScalar directly on your own Desktop or Laptop (under UNIX/Linux or Windows NT)

Lab exercise 2: Multi-Processors

The purpose of this assignment is to get familiar with multiprocessor architectures and their programming models.
In this lab you will be asked (after installing all the stuff) to partioning (parallelize) a C program using the well-known pthread library and run it on a parallel multiprocessor simulator.
We will look at different configurations, changing the number of processors, the level-1 and level-2 cache parameters (number of entries, block-size and associativity), and the bus bandwidth (the bus connects all processors, and is sitting between level-1 and level-2 caches; i.e. the level-2 cache is shared between all processors).

We will use the m5sim from University of Michigan for the assignment, and run this on top of linux (so that's the first thing you have to install; see the instructions).

You are asked to first go through the example program (our 'cookbook') and then perform the real assignment. You have to explore the multiprocessor architecture, chaning the above mentioned parameters, and produce performance-cost parato curves. Performance is determined by the total program execution time (counted in number of cycles). Cost is determined by the total area (for a certain technology). We will use a simple area model, only counting the total cache size in bytes, and the number of cores. So we exclude costs like, tag-size, bus and connect cost, etc. We will provide numbers for the area cost of 1 byte and for a processor core.
For the cores you have 2 options, both based on the DEC Alpha ISA (instruction-set architecture); one processor is a simple in-order processor, the other a more advanced out-of-order engine.

Now get your hands dirty and go to the assignment2 page, with the install instructions, then run the example, and perform the assignment.
You may also check the other links for helpful material.

You have to send in a lab report about your results to Yifan He.

Self-study of relevant topic

Guidelines are as follows:

Examination

The examination will likely be oral.
The grade is based on your project results (and being able to explain and defend them) and the discussed theory (study all slides, book chapters and handouts).

Related material and other links



Back to homepage of Henk Corporaal