Embedded Computer Architecture
2016 - 2017 (Second Quartile)
Code : 5SIA0
Credits: 5 ECTS
Lecturers : Prof. dr. Henk Corporaal
Tel. : +31-40-247 5195 (secr.) 5462 (office)
Email: H.Corporaal at tue.nl
- Luc Waeijen (L.J.W.Waeijen at tue.nl),
- Mark Wijtvliet (M.Wijtvliet at tue.nl),
- Mohammad Tahghighi (M.Tahghighi at tue.nl), and
- Roel Jordans (R.Jordans at tue.nl)
- Jan 19: all slides are now online
- All labs are online; see our oncourse.tue.nl site.
- Dec 15: new slides, including loop transformations online.
- Dec 6: new slides online, on GPUs, its architecture and
programming model (by Zhenyu Ye)
- First slide sets are online
- First lab on Design Space Explorations using CGRAs (Coarse Grain
Reconfigurable Arrays) is online.
- November 15, 2016, start of new lecture in Aud
7 (Tuesdays 9,10 + Thursdays 7,8)
article about processor design a 90 minute guide!
- Course schedule
Information on the course:
When looking at future embedded systems and their design, especially
(but not exclusively) in the multi-media domain, we observe several
In order to solve these problems we foresee the use of programmable
multi-processor platforms, having an advanced memory hierarchy, this
together with an advanced design trajectory. These platforms may
contain different processors, ranging from general purpose
processors, to processors which are highly tuned for a specific
application or application domain. This course treats several
processor architectures, shows how to program them, and
compares their efficiency in terms of cost, power and performance.
Furthermore the tuning of processor architectures is treated.
- high performance (1 TOPS and far beyond) has to be combined
with low power (many systems are mobile);
- time-to-market (to get your design done) constantly reduces;
- most embedded processing systems have to be extremely low
- the applications show more dynamic behavior (resulting in
greatly varying quality and performance requirements);
- more and more the implementer requires flexible and
- reliability gets an issue with denser silicon technologies and
more on-chip circuitry;
- huge latency gap between processors and memories; and
- design productivity does not cope with the increasing design
Several advanced Multi-Processor
Platforms, combining discussed processors, are treated. A set of
mandatory very advanced lab exercises complements the course.
This course aims at getting an understanding of the processor
architectures which will be used in future multi-processor
platforms, including their memory hierarchy, especially for the
embedded domain. Treated processors range from general purpose to
highly optimized ones. Trade-offs will be made between
performance, flexibility, programmability, energy consumption and
cost. It will be shown how to tune processors in various ways.
Studying the architecture, organization and use of the newest
(micro)processors currently on the market, and the latest research
developments in computer architecture. Architectures exploiting
instruction-level parallelism (ILP), data-level parallelism (DLP),
thread-level and task-level parallelism are treated. Starting from
basic architecture concepts we will end with discussing the latest
This course also treats how processors can be combined in a
multiprocessing platform, e.g. by using a Network-on-Chip.
Inter-processor communication issues will be dealt with.
Furthermore some code generation techniques needed for exploiting
ILP will be treated (Note, code generation will be far more
extensively treated in a special course on Parallelization,
Compilers and Platforms; 5LIM0).
Special emphasis will be on quantifying design decisions in terms
of energy, performance and cost. The intention of the course is to
give students the ability to understand the design principles and
operation of new (multi-)processor architectures, and evaluate
them both qualitatively and quantitatively. Although we treat
several examples, the emphasis will be on architecture concepts.
Furthermore, 3 intensive lab exercises are part of course; they
will learn you the design space of multi- and graphics processors.
We will invite several guest lectures treating State-of-the-Art
topics in the computer architecture area.
- Basic principles (like RISC and instruction set design),
pipelining and its consequences.
- All processor architecture varieties, including VLIW
(very long instruction word, including TTAs,
Transport Triggered Architectures), Superpipelined, Superscalar,
SIMD (single instruction, multiple data, used in vector
and sub-word parallel processors) and MIMD (multiple
instruction, multiple data) architectures; SMT (Simultaneous
Multi-Threading); CGRAs (Coarse Grain Reconfigurable
Arrays), and Accelerators.
- Concepts like: Out-Of-Order and speculative execution;
Branch prediction; Data (value) prediction; Design of advanced
memory hierarchies; Memory coherency and consistency;
Multi-threading; Exploiting task-level and instruction-level
parallelism; Inter-processor communication models; Input and
output; Network Communication Architecture; and
- In all cases it is shown how to program these
architectures. Furthermore their combination and
interconnection in an MPSoCs (Multi-Processor Systems-on-Chips)
- Most of the topics will be supplemented by very elaborate
The lecture slides will be made available during the course; see
Papers and other reading material.
Book: Parallel Computer Organization and
Michel Dubois, Murali Annavaram, and Per Stenström
Cambridge university press
In addition: Learn Chapter 2 on
Computer Architecture Trends
taken from the book "Microprocessor Architectures, from VLIW to
Corporaal, publisher John Wiley, 1998.
Slides (per topic; see also the course description)
** Slides will be updated regularly during the course.
- Overview of this lecture,
including preliminary Course Schedule
- Chapter 1:
Computer Systems Overview
- Chapter 2: Technology
- Chapter 3 part 1: Micro
- Includes the MIPS ISA (Instruction Set Architecture)
and MIPS pipelined implementation
- Chapter 3 part 2:
Out-of-Order architectures, Superscalar, Branch Prediction,
Speculative executions and limits to Instruction-Level
Parallelism (ILP limits)
- Accelerators + CGRA
- Guest Lecture by Mark Wijtvliet
- GPU: Graphic Processing Units, all
about its internals and programming model
- Guest Lecture by Zhenyu Ye (from Connecterra, Amsterdam)
- Tricks of the Trade: From Loop
Transformations to Automatic Optimization
- Guest Lecture by Maurice Peemen (from FEI, Eindhoven)
- Chapter 4: Memory hierarchy,
caches, virtual memory, main memory
- Chapter 9: Simulation and
Simulators by Luc Waeijen
- Chapter 5: Multiprocessing
- Chapter 6: Processor Interconnect
- Chapter 7:
Coherence, Consistency and Synchronization
- Neural Computer
- Guest lecture by Maurice Peemen (from FEI, Eindhoven)
- Chapter 8.3:
Slides corresponding to labs
- Lab 1 on CGRAs, Design Space
Exploration (link refers to oncourse; you need to log in)
Below a short description and link to oncourse page with all the
- Lab 2 look here
- Lab 3 using GEM5 (check always the lab / concourse pages for
the most up-to-date details):
- See further the wiki and oncourse sites for the lab
information (see below)
Hands-on lab work
Becoming a very good Computer Architect you have to practice a
lot. Therefore, as part of this course we have put a lot of
effort in preparing 3 very interesting lab assignments. For each lab
there is a website with all the required documentation and
preparation material. These lab assignments can be made (largely,
but check the wiki pages) on your own laptop, with for certain
parts, remote access to our server systems.
For every lab you have to write a report, which has to be sent to
one of the course assistants, or via the oncourse web site (always
check the lab specific instructions).
Processor Design Space Exploration, based on the CGRA from TU/e
You will be asked to design and optimize a low power and very
flexible CGRA, Coarse Grain Reconfigurable Array processor for a
certain application. In particular you have to trade-off
performance and energy consumption.
Details and instructions about this assignment can be found here
(link to oncourse.tue.nl website, you need to log in)
Lab 2: Programming Graphic Processing Units
Graphic processing units (GPUs) can contain upto thousands of
Processing Engines (PEs). They achieve performance levels of Tera
FLops (10^12 floating point operations per second). In the past GPUs
were very dedicated, not general programmable, and could only
be used to speedup graphics processing. Today, they become
more-and-more general purpose and even appear in high end embedded
systems. The latest GPUs of ATI/AMD and NVIDIA can be programmed in
Cuda and/or OpenCL. For this lab we will use GPUs together
with the OpenCL (based on C) programming environment.
After studying the example and learning material you have to perform
your own assignment and hand in a small report. The assignment this
year is about generating money, coins, called Coinporaals. Generate
as many as possible by making your program extremely efficient.
All the details about this assignment can be found on the GPU-assignment
site, or check our oncourse.tue.nl website.
Lab 3: Designing and Programming Multi-Processor systems
The state-of-the-art CPUs contain dozens of cores on a single
die. The trend of going multi-core posts new challenges to both
computer architects and programmers. In this assignment, we will
try to tackle these challenges, from the view point of both
computer architects and programmers. The purpose of this
assignment is to:
In this lab assignment, you will be asked to map a C program onto a
multiprocessor system. With the help of the Gem5
simulator, we will look at different configurations, e.g., the
number of processors, block-size and associativity of
different levels of caches. The goal is to optimize the performance
of the system. You can achieve this goal by improving the original C
code, using pthreads and any other creative methods.
- Get an in-depth understanding of
mainstream multi-core CPU architectures.
- Learn how to develop parallel
application on such architectures, and how to analyze the
performance in a real environment.
Details about the assignment can be found on our oncourse.tue.nl
As part of this lecture you have to study a hot topic related to
this course, and make a short slide presentation about this topic.
The slides have to be presented during the oral exam.
Guidelines are as follows:
- Choose one hot topic
which interests you and which is highly related to this
- Select one technical (in depth) research paper from the web,
based on this topic.
- The paper should have
sufficient technical depth; i.e. it should clearly
explain all the details of the proposed method or solution. So
e.g. do not choose company white or business papers. You can
also check whether the paper is from well perceived journals or
conferences, like IEEE, or ACM conferences and journals
(see e.g. IEEE.org, and ACM.org). E.g., have a look at
the following conferences:
Automation and Test in Europe:
Codesign) + ISSS (International Symposium on System
Architectures, and Synthesis for Embedded Systems:
Symposium on Micro Arch: www.microarch.org
Computer Architecture: www.hpcaconf.org
architectures and compilation techniques:
- A larger list can be found here.
- The paper should be published in 2014 or later.
- You should make a powerpoint presentation on your topic; max 5 min. per presentation
(e.g. 5 slides;
one slide introducing the problem, then the approach and results
of each paper, and final conclusion and suggestions from your
side on this topic; add / use clear pictures to explain the approach)
- The presentation should contain at least the following:
- Summary of
the paper contributions (including technical details)
- Your evaluation
of the paper and topic
- strong points
- weak points
- applicability of proposed methodology / solution
- indicate new / future directions of research
- In order to evaluate the paper you may wish to read related material on the same
- Your presentation will be evaluated by us. This evaluation
will be taken into account for the final grading.
The examination will be oral and/or online (using your laptop),
about the treated course theory (all treated slides + corresponding
parts of the used book), the lab report(s), and studied article.
When: end of January, early February 2016. We will discuss the dates
Grading depends on your results on theory, lab exercises and your
Related material and other links
- Computer Architecture: A quantitative approach
Hennessy and Patterson
Especially check Chapters 1-5 and Appendices A-C
- (note: we used this book in earlier related courses on
'advanced computer architecture')
Interesting processor architectures:
- The cell
architecture, made by Sony, IBM and Toshiba, and used e.g.
in Playstation 3
architecture, combining several types of parallelism
- The tile based RAW
architecture from MIT
a hybrid SIMD - VLIW architecture from Stanford
- Merrimac, the
successor of the Imagine
- ChipCon, check e.g. their system-on-chip: CC1110
from MAXIM, Dallas; a Transport Triggered Architecture
a Network-on-Chip from Philips
Back to homepage of Henk Corporaal