Embedded Computer Architecture
2015 - 2016 (Second Quartile)
Code : 5SIA0
Credits: 5 ECTS
Lecturers : Prof. dr. Henk Corporaal
Tel. : +31-40-247 5195 (secr.) 5462 (office)
Email: H.Corporaal at tue.nl
Project assistance: Luc Waeijen (L.J.W.Waeijen at tue.nl),
Mark Wijtvliet (M.Wijtvliet at tue.nl), and Roel Jordans (R.Jordans
at tue.nl)
News
- Oral exams: on Jan 29, Feb 1, 2 and 4
- Jan 10: updated/uploaded slides of chapter 6,7 and 8
- Jan 4: uploaded new lectures on chapter 4 and 5
- GPU guest lecture Monday 14 December; slides are online.
- Dec 11: lab
3 is online; have a look.
- Lab
assignment 2 is already online !
- Updated schedule
- Slides Ch 3 part 1 added
- Nov 16: Lab 1 on Multi Processor design has started; see the lab
site.
- Slides of chapter 1 and 2 have been added
- November 9, 2015, start of new lecture in IPO 0.98 (Mondays
3,4) / Pav B2 (changed: was originally U46; Thursdays 7,8)
- Interesting
article about processor design a 90 minute guide!
Information on the course:
Description
When looking at future embedded systems and their design, especially
(but not exclusively) in the multi-media domain, we observe several
problems:
- high performace (1 TOPS and beyond) has to be combined with
low power (many systems are mobile);
- time-to-market (to get your design done) constantly reduces;
- most embedded processing systems have to be extremely low
cost;
- the applications show more dynamic behavior (resulting in
greatly varying quality and performance requirements);
- more and more the implementer requires flexible and
programmable solutions;
- huge latencie gap between processors and memories; and
- design productivity does not cope with the increasing design
complexity.
In order to solve these problems we foresee the use of programmable
multi-processor platforms, having an advanced memory hierarchy, this
together with an advanced design trajectory. These platforms may
contain different processors, ranging from general purpose
processors, to processors which are highly tuned for a specific
application or application domain. This course treats several
processor architectures, shows how to program them, and
compares their efficiency in terms of cost, power and performance.
Furthermore the tuning of processor architectures is treated.
Several advanced Multi-Processor
Platforms, combining discussed processors, are treated. A set of
lab exercises complements the course.
Purpose:
This course aims at getting an understanding of the processor
architectures which will be used in future multi-processor
platforms, including their memory hierarchy, especially for the
embedded domain. Treated processors range from general purpose to
highly optimized ones. Tradeoffs will be made between performance,
flexibility, programmability, energy consumption and cost. It will
be shown how to tune processors in various ways.
Studying the architecture, organization and use of the newest
(micro)processors currently on the market, and the latest research
developments in computer architecture. Architectures exploiting
instruction-level parallelism (ILP), data-level parallelism (DLP),
thread-level and task-level parallelism are treated. Starting from
basic architecture concepts we will end with discussing the latest
commercial processors.
This course also treats how processors can be combined in a
multiprocessing platform, e.g. by using a Network-on-Chip.
Interprocessor communication issues will be dealt with.
Furthermore new code generation techniques needed for exploiting
ILP will be treated. Special emphasis will be on quantifying
design decisions in terms of performance and cost. The intention
of the course is to give students the ability to understand the
design principles and operation of new (multi-)processor
architectures, and evaluate them both qualitatively and
quantitatively. Although we treat several examples, the emphasis
will be on architecture concepts. Furthermore, 3 intensive lab
exercises are part of course; they will learn you the design space
of multi- and graphics processors.
Topics:
Basic principles (like instruction set design), pipelining and its
consequences; VLIW (very long instruction word, including TTAs,
Transport Triggered Architectures), Superpipelined, Superscalar,
SIMD (single instruction, multiple data, used in vector and
sub-wordparallel processors) and MIMD (multiple instruction,
multiple data) architectures; SMT (Simultaneous Multi-Threading);
Out-of-order and speculative execution; Branch prediction; Data
(value) prediction; Design of advanced memory hierarchies; Memory
coherency and consistency; Multi-threading; Exploiting task-level
and instruction-level parallelism; Inter-processor communication
models; Input and output; Network Communication Architecture; and
Networks-on-Chip.
In all cases it is shown how to program these architectures.
Furthermore their combination and interconnection in an MPSoCs
(Multi-Processor Systems-on-Chips) is treated.
Most of the topics will be supplemented by very elaborate hands-on
exercises.
The lecture slides will be made available during the course; see
also below.
Papers and other reading material
Book: Parallel Computer
Organization and Design
Authors:
Michel Dubois, Murali Annavaram, and Per Stenström
Cambridge university press
October 2012
Learn Chapter 2 on Computer
Architecture Trends
From the book "Microprocessor Architectures, from VLIW to TTA"
by Henk
Corporaal, publisher John Wiley, 1998.
Slides (per topic; see also the course description)
** Slides as far as available. Slides will be updated regularly
during the course.
Slides corresponding to labs
- Lab 1 on Multi-Core programming and Design Space Exploration
Using SniperSim (Luc Waeijen; see also below and the mentioned
Wiki site)
- See further the wiki sites for the lab information (see below)
Hands-on lab work
Becoming a very good Computer Architect you have to practice a lot.
Therefore, as part of this course we have put a lot of effort to
prepare 3 very interesting lab assignments. For each lab there is a
website with all the required documentation and preparation
material. These lab assignments can be made on your own laptop, with
for certain parts, remote access to our server systems.
For every lab you have to write a report, which has to be sent to
one of the course assistants.
** labs will updated and be put online during the course **
Lab 1:
Designing and Programming Multi-Processor systems
The state-of-the-art CPUs contain dozens of cores on a
single die. The trend of going multi-core posts new challenges to
both computer architects and programmers. In this assignment, we
will try to tackle these challenges, from the view point of both
computer architects and programmers. The purpose of this assignment
is to
- Get an in-depth understanding of
mainstream multi-core CPU architectures.
- Learn how to develop parallel
application on such architectures, and how to analyze the
performance in a real environment.
In this lab assignment, you will be asked to map a C
program onto a multiprocessor system. With the help of SniperSim
simulator, we will look at different configurations, e.g., the
number of processors, block-size and associativity of
different levels of caches. The goal is to optimize the Energy-Delay-Area-Product (EDAP) of
the system. You
can achieve this goal by
improving
the original C code, using OpenMP, and/or using
any other creative methods.
Details about the assignment can be found here.
Lab 2:
Processor Design Space Exploration, based on the Silicon Hive
Architecture from INTEL
You will be asked to design and optimize a low power VLIW processor
for the ECG application. In particular you have to trade-off
performance and energy consumption.
Details and instructions about this assignment can be found here.
Lab 3: Programming
Graphic Processing Units
Graphic processing units (GPUs) can contain upto thousands of
Processing Engines (PEs). They achieve performance levels of Tera
FLops (10^12 floating point operations per second). In the past GPUs
were very dedicated, not general programmable, and could only
be used to speedup graphics processing. Today, they become
more-and-more general purpose and even appear in high end embedded
systems. The latest GPUs of ATI/AMD and NVIDIA can be programmed in
Cuda and/or OpenCL. For this lab we will use GPUs together
with the OpenCL (based on C) programming environment.
After studying the example and learning material you have to perform
your own assignment and hand in a small report. The assignment this
year is about generating money, coins, called Coinporaals. Generate
as many as possible by making your program extremely efficient.
All the details about this assignment can be found on the GPU-assignment
site.
Student presentations
guidelines
As part of this lecture you have to study a hot topic related to
this course, and make a short slide presentation about this topic.
The slides have to be presented during the oral exam.
Guidelines are as follows:
- Choose one hot topic
which interests you and which is highly related to this
course.
- Select one technical (in depth) research papers from the
web, based on this topic.
- The paper should have
sufficient technical depth; i.e. it should clearly
explain all the details of the proposed method or solution. So
e.g. do not choose company white or business papers. You can
also check whether the paper is from well perceived journals or
conferences, like IEEE, or ACM conferences and journals
(see e.g. IEEE.org, and ACM.org). E.g., have a look at
the following conferences:
- DATE:
Design
Automation and Test in Europe:
www.date-conference.com
- CODES
(Hardware-Software
Codesign) + ISSS (International Symposium on System
Synthesis): www.codes-isss.org
- CASES:
Compilers,
Architectures, and Synthesis for Embedded Systems:
www.casesconference.org
- IEEE
MICRO:
Symposium on Micro Arch: www.microarch.org
- HPCA:
High-Performance
Computer Architecture: www.hpcaconf.org
- PACT:
parallel
architectures and compilation techniques:
www.eecg.toronto.edu/pact
- A larger list can be found here.
- The paper should be published in 2013 or later
(try to choose a
very recent papers).
- You should make a powerpoint presentation on your topic; max 5 min. per presentation
(e.g. 5 slides;
one slide introducing the problem, then the approach and results
of each paper, and final conclusion and suggestions from your
side on this topic; add / use clear pictures to explain the approach)
- The presentation should contain at least the following:
- Summary of
the paper contributions (including technical details)
- Your evaluation
of the paper and topic
- strong points
- weak points
- applicability of proposed methodology / solution
- indicate new / future directions of research
- In order to evaluate the paper you may wish to read related material on the same
topic.
- Your presentation will be evaluated by us. This evaluation
will be taking into account for the final grading.
Examination
The examination will be oral about the treated course theory (all
treated slides + corresponding parts of the used book), the lab
report(s), and studied article.
When: 4th week of January 2016, or first week of February. We will
discuss the dates with you.
Grading depends on your results on theory, lab exercises and your
presentation.
Related material and other links
- Computer Architecture: A quantitative approach
Hennessy and Patterson
5th Edition
Especially check Chapters 1-5 and Appendices A-C
Interesting processor architectures:
- The cell
architecture, made by Sony, IBM and Toshiba, and used e.g.
in Playstation 3
- TRIPS
architecture, combining several types of parallelism
- The tile based RAW
architecture from MIT
- Imagine,
a hybrid SIMD - VLIW architecture from Stanford
- Merrimac, the
successor of the Imagine
- ChipCon, check e.g. their system-on-chip: CC1110
- MAXQ
from MAXIM, Dallas; a Transport Triggered Architecture
- Aethereal,
a Network-on-Chip from Philips
Back to homepage of Henk Corporaal