Platform-based Design
2008 - 2009 (1st semester)
Code : 5KK70
Credits: 5 ECTS
Lecturers : Prof. dr.
Henk Corporaal, Dr. Bart Mesman
Tel. : +31-40-247 5195 / 3653 (secr.) 5462 (office)
Email: B.Mesman at tue.nl;
H.Corporaal at tue.nl
Project assistance: Hamed Fatemi (H.Fatemi at tue.nl), Akash
Kumar
(A.Kumar at tue.nl)
News
Information on the course:
Description
When looking at future embedded systems and their design, especially
(but
not exclusively) in the multi-media domain, we observe several
problems:
- high performace (10 GOPS and beyond) has to be combined with low
power (many systems are mobile);
- time-to-market (to get your design done) constantly reduces;
- most embedded processing systems have to be extremely low cost;
- the applications show more dynamic behavior (resulting in
greatly varying quality and performance requirements);
- more and more the implementer requires flexible and programmable
solutions;
- huge latencie gap between processors and memories; and
- design productivity does not cope with the increasing design
complexity.
In order to solve these problems we foresee the use of programmable
multi-processor platforms, having an advanced memory hierarchy, this
together with an advanced design trajectory. These platforms may
contain different processors, ranging from general purpose processors,
to processors which are highly tuned for a specific application or
application domain. This course treats several processor architectures,
shows how to program and generate (compile) code for them, and compares
their efficiency in terms of cost, power and performance. Furthermore
the tuning of processor architectures is treated.
Several advanced
Multi-Processor Platforms, combining discussed processors, are treated.
A set
of lab exercises complements the course.
Purpose:
This course aims at getting an understanding of the processor
architectures
which will be used in future multi-processor platforms, including their
memory
hierarchy, especially for the embedded domain. Treated processors range
from
general purpose to highly optimized ones. Tradeoffs will be made
between
performance, flexibility, programmability, energy consumption and cost.
It will
be shown how to tune processors in various ways.
Furthermore this course looks into the required design trajectory,
concentrating on code generation, scheduling, and on efficient data
management
(exploiting the advanced memory hierarchy) for high performance and low
power.
The student will learn how to apply a methodology for a step-wise
(source code)
transformation and mapping trajectory, going from an initial
specification to
an efficient and highly tuned implementation on a particular platform.
The
final implementation can be an order of magnitude more efficient in
terms of
cost, power, and performance.
Topics:
In this course we treat different processor architectures: DSP (digital
signal processors), VLIWs (very long instruction word, including
Transport Triggered Architectures), ASIPs (application specific
processors), and highly tuned, weakly programmable processors. In all
cases it is shown how to program these architectures. Code generation
techniques, especially for VLIWs, are treated, including methods to
optimize code at source or assembly level. Furthermore the design of
advanced data and instruction memory hierarchies will be detailed. A
methodology is discussed for the efficient use of the data memory
hierarchy.
Most of the topics will be supplemented by hands-on exercises.
For more information on course and lecture schedule see:
course description
The lecture slides will be made available during the course; see also
below.
Papers and other reading material
- Learn Chapter 2 on Computer Architecture Trends
From "Microprocessor Architectures, from VLIW to TTA" by Henk
Corporaal, publisher John Wiley, 1998.
- A paper about data reuse.
Formalized methodology for data reuse exploration in hierarchical
memory mapping.
J.Ph.Diguet e.a.
- Code transformations.
Code transformations for data transfer and storage exploration
preprocessing multimedia processors.
Francky Catthoor, Nikil D. Dutt, Koen Danckaert and Sven Wuytack
IEEE Design and Test of Computers, May-June 2001
- Data storage components.
Random-access data storage components in customized architectures
Lode Nachtergaele, Francky Catthoor and Chidamber Kulkarni
IEEE Design and Test of Computers, May-June 2001
- Data optimizations.
Data memory organization and optimizations in Application Specific
systems
P.R. Panda e.a.
IEEE Design and Test of Computers, May-June 2001
Slides (per topic; see also the course description)
Slides as far as available (will be updated regularly during the
course).
- Overview of this lecture
- Lecture 1: Introduction + Programmable
CPU / RISC cores
Detailed discussion of the MIPS architecture and implementation, based
on the book of Patterson and Hennessy, Computer Organization
- Lecture 2-3: VLIW architectures (part a)
- Lecture 3-4: VLIW architectures (part b)
+
ILP compilation (part a)
- Lecture 5: Programmable Digital Signal
Processors
- Lecture 6: SIMD
- Lecture 7: MPSoC
- Lecture 8: Silicon Hive VLIW cores;
Introduction to the first lab
assignment
Guest lecture by Ir. Menno Lindwer, PDEng
- Lecture 9: MPSoC continued, Real-time
Scheduling
- Lecture 10: Cell
architecture
- Lecture 11: Data Memory Management
(DMM): part a
- Lecture 12: Data Memory
Management (DMM): part b
- Lecture 13: Data Memory Management
(DMM): part c
- Lecture 14+15: Student
presentations
See below for details.
Student presentations guidelines
The last two lectures will be used to let you present a topic highly
related to this course.
Guidelines are as follows:
- Choose a hot topic which
interests you and which is highly related to this course.
- Select a technical research paper from the web, based on this
topic; each student has to read and review 1 paper.
- The paper should have sufficient technical depth; i.e. it should
clearly explain all the details of the proposed method or solution. You
can also check whether the paper is from well perceived journals or
conferences, like IEEE, or ACM conferences and journals (see e.g.
IEEE.org, and ACM.org). E.g., have a look at the
following conferences:
- DATE:
Design Automation and Test in Europe:
www.date-conference.com
- CODES
(Hardware-Software Codesign) + ISSS (International Symposium on System
Synthesis): www.codes-isss.org
- CASES:
Compilers, Architectures, and Synthesis for Embedded Systems:
www.casesconference.org
- IEEE
MICRO: Symposium on Micro Arch: www.microarch.org
- HPCA:
High-Performance Computer Architecture: www.hpcaconf.org
- PACT:
parallel architectures and compilation techniques:
www.eecg.toronto.edu/pact
- A larger list can be found here.
- The paper should be published in the last 5 years.
- You should make a powerpoint presentation on your topic; max 6
min. per student.
- The presentation should contain at least the following:
- Summary of the
paper contirbution (including technical details)
- Your evaluation
of this paper
- strong points
- weak points
- applicability of proposed methodology / solution
- indicate new / future directions of research
- In order to evaluate the paper you may have to read related
material on the same topic.
- You are expected to prepare this in groups of 2 students (work
together with one other student, as far as possible). So you can
discuss 2 papers about the same topic. Each student
should present part of the presentation.
- The total presenentation time is max 12 minutes for 2 students,
or15 minutes including questions.
- Your presentation will be evaluated by us. This evaluation will
be taking into account for the final grading.
- Send us (before your presentation) a copy of your slides and of
the discussed paper. Include the bibliography info of paper(s)
(authors, title, journal / conference, date, pages) such that we can
make a website on this.
Hands-on lab work
Will be updated during the course!
During the course there are three lab exercises to be made (so called
hands-on); see also the links below. They will be explained at the
corresonding lectures.
Hands-on 1:
Processor Design Space Exploratoin, based on the Silicon Hive
Architecture
In this excercise we explore the reconfigurable processor from Silicon
Hive
For this excercise
- Check the link http://www.ics.ele.tue.nl/~akash/education/5kk70/
You'll find several pdf files. Have a look at
all of them first.
- Then check the start-up guide in detail.
- Thereafter start with the assignment.It also describes what are
the deliverables you have to sent in (as a small report)..
Hands-on 2:
Platform Programming
In this lab you are asked to program a multi-processor platform.
There are two options: using the CELL platform or using the Wica
platform. This year we recommend that most students
do the CELL assignment.
a. Programming the CELL Broadband Engine
The CELL contains a PowerPC
processor and 8 SPEs (Synergetic Processing Engines) of which you can
use 6 (number 7 is used by the operating system and number 8 is not
guaranteed to be functional). The CELL processor is part of the Sony
Playstation 3, which we will use as target. But also a good simulator
and compiler environment is available.
All details about the architecture, the simulation and compiler
environment, and
example programs can be found at the CELL-assignemnt page. Read this
page carefully and follow the instructions.
b. Programming the WiCa 1.1 board
The WiCa 1.1 board is developed by Philips and NXP. The board is meant
for being used in Smart Camera's. It contains among others the Xetal
SIMD image processing chip, containing 320 Processing Elements, and an
8051 microcontroler. To observe the world it contains two image
sensors; this allows even for stereo vision and depth calculation.
To connect to your PC it has an USB interface, but you can also attach
a ZigBee low power interface to make a smart wireless sensor network.
The assignment is on using the image sensors to detect simple objects,
their movement, and if possible, cooperate with other boards.
All details about the WiCa platform, the simulation and compiler
environment, and
example programs can be found at the WiCa-assignemnt page. Read this
page carefully and follow the instructions.
Hands-on 3: Exploiting the data memory hierarchy for high
performance and low power
In this exercise you are asked to optimize a C algorithm by using the
discussed data management techniques. This should result into an
implementation which shows a much improved memory behavior. This
improves performance and energy consumption. In this exercise we mainly
concentrate on reducing energy consumption. You need to download the
following, and follow the instructions:
- Guidelines. This describes
stepwise what you should do.
- The algorithm and other
required files.
- Send the report to Corporaal, and bring a hardcopy + final code
to the oral exam.
Examination
The examination will be oral about the treated course theory, and the
lab report(s).
Date: ** to be decided **.
Grading depends on your results on theory, lab exercises and your
presentation.
Related material and other links
Interesting processor architectures:
- The cell
architecture, made by Sony, IBM and Toshiba, and used e.g. in
Playstation 3
- TRIPS architecture,
combining several types of parallelism
- The tile based RAW
architecture from MIT
- Imagine,
a hybrid SIMD - VLIW architecture from Stanford
- Merrimac, the
successor of the Imagine
- ChipCon, check e.g. their system-on-chip: CC1110
- MAXQ
from MAXIM, Dallas; a Transport Triggered Architecture
- Aethereal,
a Network-on-Chip from Philips
Back to homepage of Henk Corporaal