Advanced Computer Architecture
Course
year: 2009-2010
Code : |
5MD00 (3 ECTS) /
5Z033 (4 ECTS)
|
Lecturers : |
Prof. dr. Henk Corporaal
Prof. dr. ir. R.H.J.M. Otten |
Email : |
H.Corporaal at tue.nl |
Phone : |
TU/e: +31-40-247 5195 or 3653
(secr. TU/e) / 5462 (office TU/e) |
Assistance: |
Dr. ir. Sander Stuijk (miniMIPS
models + FPGA synthesis): s.stuijk at tue.nl
Ir. Akash Kumar (FPGA synthesis, SiHive lab): a.kumar at tue.nl
MSc Yifan He: y.he at tue.nl
|
Prerequisets: |
Course in computer architecture
and processor design; e.g. Processor Design 5Z032 or Computation 5JJ70;
Programming experience in C, C++, or equivalent language |
News
- 4 Jan 2010 added slides on multi-processors part 1
- 23 December 2009: second lab using M5 is now online.
Deadline for finishing this lab is Monday January 18; send your report
to Yifan He.
Information on the course:
Description and objectives
Studying the architecture, organization and use of the
newest general purpose (micro)processors currently on the market, and
the latest
research developments in computer architecture. Architectures
exploiting instruction-level parallelism (ILP), thread-level and
task-level
parallelism are treated. Starting from basic architecture concepts we
will end with
discussing the latest commercial processors (e.g., Pentium 4
multi-core, EPIC
processors like Itanium, and embedded processors such as the TriMedia),
and academic processors (like TRIPS).
This course also treats how processors can be combined in a
multiprocessing platform, e.g. by using a Network-on-Chip.
Interprocessor communication issues will be dealt with. Furthermore new
code generation techniques needed for exploiting ILP will be treated.
Special emphasis will be on quantifying design decisions in terms of
performance and cost.
The intention of the course is to give students the ability to
understand the design principles and operation of new (multi-)processor
architectures, and evaluate them both qualitatively and quantitatively.
Although we treat several examples, the emphasis will be on
architecture concepts. Furthermore, the aim is to design, implement and
test a Network-on-Chip, by one or more student teams.
Topics:
Basic principles (like instruction set design), pipelining and its
consequences; VLIW (very long instruction word) architectures,
Superpipelined, Superscalar, SIMD (single instruction, multiple data,
used in vector and sub-wordparallel processors) and MIMD (multiple
instruction, multiple data) architectures; SMT (Simultaneous
Multi-Threading);
Out-of-order and speculative execution; Branch prediction; Data (value)
prediction; Design of advanced memory hierarchies; Memory coherency and
consistency; Multi-threading;
Exploiting task-level and instruction-level parallelism;
Inter-processor communication models; Input and
output; Network Communication Architecture; and Networks-on-Chip.
|
Computer Architecture: A Quantitative Approach; 4th ed.
John L. Hennessy and David A. Patterson
Morgan Kaufmann Publishers
ISBN 9780123704900 |
Handouts (for slides see below):
Slides
** will be added and updated during the course period **
- Overview slides (including
preliminary schedule)
- Topic 1: Computer Systems
Overview
- Topic 2: Crash course on MIPS
- Topic 3:
Instruction-set design
- Topic 4:
Instruction-Level Parallel (ILP) architectures
- Topic 5:
Exploiting ILP
with Software approaches
- Topic
6: SMT:
simultaneous
multi-threading
- Guest lecture by Wouter van der Put: Time
Predictability of a Computer System
- Topic 7: Multi-Processors
- part 1
- Including Synchronization, Memory Coherence, and Memory
Consistency
- part 2
- Including Interconnection Networks
- The TOP500 of supercomputing (note
this is a big file)
- Topic 8 Caches
and
Memory
Hierarchy;
Many advanced techniques are discussed to speed up this memory
hierarchy.
At the end of this slide set a recap of cache basics (you can find this
in appendix C of the book)
Project
Part of the course will be project based.
Lab exercise
1: Single General-Purpose Processors
This exercise makes you familiar with the highly configurable SimpleScalar simulation
platform. The SimpleScalar tool set is a system software infrastructure
used to build modeling applications for program performance analysis,
detailed microarchitectural modeling, and hardware-software
co-verification. Using the SimpleScalar tools, users can build modeling
applications that simulate real programs running on a range of modern
processors and systems. The tool set includes sample simulators
ranging from a fast functional simulator to a detailed, dynamically
scheduled processor model that supports non-blocking caches,
speculative execution, and state-of-the-art branch prediction.
It can emulate e.g. ARM, Alpha and x86 instruction sets.
Follow carefully the instruction which can be found here. The
instructions describe how to run the tools on our server, which you can
access remotely. It is also possible to use SimpleScalar directly on
your own Desktop or Laptop (under UNIX/Linux or Windows NT)
Lab exercise 2: Multi-Processors
The purpose of this assignment is to get familiar with multiprocessor
architectures and their programming models.
In
this lab you will be asked (after installing all the stuff) to
partioning (parallelize) a C program using the well-known pthread
library and run it on a parallel multiprocessor simulator.
We will
look at different configurations, changing the number of processors,
the level-1 and level-2 cache parameters (number of entries, block-size
and associativity), and the bus bandwidth (the bus connects all
processors, and is sitting between level-1 and level-2 caches; i.e. the
level-2 cache is shared between all processors).
We will use the m5sim
from University of Michigan for the assignment, and run this on top of
linux (so that's the first thing you have to install; see the
instructions).
You are asked to first go through the example
program (our 'cookbook') and then perform the real assignment. You have
to explore the multiprocessor architecture, chaning the above mentioned
parameters, and produce performance-cost parato curves. Performance is
determined by the total program execution time (counted in number of
cycles). Cost is determined by the total area (for a certain
technology). We will use a simple area model, only counting the total
cache size in bytes, and the number of cores. So we exclude costs like,
tag-size, bus and connect cost, etc. We will provide numbers for the
area cost of 1 byte and for a processor core.
For the cores you have
2 options, both based on the DEC Alpha ISA (instruction-set
architecture); one processor is a simple in-order processor, the other
a more advanced out-of-order engine.
Now get your hands dirty and go to the assignment2 page,
with the install
instructions, then run the example, and perform the assignment.
You may
also check the other links for helpful material.
You have to send in a lab report about your results to Yifan He.
Self-study of relevant topic
Guidelines are as follows:
- Choose a hot topic which
interests you and which is highly
related to this course.
- Select a technical research paper from the web, based on this
topic; each student has to read and review 1 paper.
- The paper should have sufficient technical depth; i.e. it should
clearly explain all the details of the proposed method or solution. You
can also check whether the paper is from well perceived journals or
conferences, like IEEE, or ACM conferences and journals (see e.g.
IEEE.org, and ACM.org).
For a list of important journals and conferences look here.
- The paper has to be presented with at most 8 slides during the
oral exam.
- Indicate on the last slide the strong but also the weak points of
the paper
Examination
The examination will likely be oral.
The grade is based on your project results (and being able to explain
and defend them) and the discussed theory (study all slides, book
chapters and handouts).
Related material and other links
- Reading material
- The 8-core 45 nm Intel Xeon processor: by Stefan Rusu e.a.
Nehalem-EX: check IEEE Asian Solic-State Circuits Conference Nov 2009,
Taipei
- PreMaDoNA project
website. Describes
our Network-on-Chip activities. See also our new NEST project.
- Slides about IA64 and Itanium:
- IA-64 Architecture Innovations by John Crawford and Jerry
Huck (ppt)
(IA-64 at the Intel Developers' Forum February '99)
- IA-64 Overview by David Fotland
(ppt)
(IA-64 at the IEEE Vail Computer Elements Workshop in June '99)
- IA-64 Register Model: Stack and Rotation by Dale Morris (ppt)
(IA-64 at the IEEE Vail Computer Elements Workshop in June '99)
- Compiling for IA-64 by Carol Thompson (ppt)
(IA-64 at the IEEE Vail Computer Elements Workshop in June '99)
- Understanding the detailed Architecture of
AMD's 64 bit Core by Hans de Vries
Back to homepage of Henk Corporaal