Processor Architectures and Program Mapping
2006 - 2007
Code : 5KK10
Lecturers : Prof. Jef van Meerbergen, Dr. Bart Mesman, Prof.
Henk Corporaal
Tel. : +31-40-247 5195 / 3653 (secr.) 5462 (office)
Email: J.v.Meerbergen at tue.nl; B.Mesman at tue.nl;
H.Corporaal at tue.nl
Project assistance: Hamed Fatemi (H.Fatemi at tue.nl)
UPDATES
- 14 Feb 2007: Updated the final lab exercise
- 13 Feb 2007: Slides for final lecture 10 added
- 23 Feb 2007: Imagine lab exercise (Hands-on 2) has been updated
- 23 Feb 2007: slides of lecture 9 has been added
- 21 Feb 2007: slides up to lecture 8 have been added; note lecture
8 has 3 slide-sets
- 11 Jan 2007: slides up to lecture 5 have been added.
- 11
Jan 2007: first part of handson 2 has been added. The first part you
should finish on Feb 8.
Information on the course:
Description
When looking at future embedded systems and their design, especially
(but
not exclusively) in the multi-media domain, we observe several
problems:
- high performace (10 GOPS and beyond) has to be combined with low
power (many systems are mobile);
- time-to-market (to get your design done) constantly reduces;
- most embedded processing systems have to be extremely low cost;
- the applications show more dynamic behavior (resulting in
greatly varying quality and performance requirements);
- more and more the implementer requires flexible and programmable
solutions;
- huge latencie gap between processors and memories; and
- design productivity does not cope with the increasing design
complexity.
In order to solve these problems we foresee the use of programmable
multi-processor platforms, having an advanced memory hierarchy, this
together with an advanced design trajectory. These platforms may
contain different processors, ranging from general purpose processors,
to processors which are highly tuned for a specific application or
application domain. This course treats several processor architectures,
shows how to program and generate (compile) code for them, and compares
their efficiency in terms of cost, power and performance. Furthermore
the tuning of processor architectures is treated.
Purpose:
This course aims at getting an understanding of the processor
architectures which will be used in future multi-processor platforms,
including their memory hierarchy. Treated processors range from general
purpose to highly optimized ones. Tradeoffs will be made between
performance, flexibility, programmability, energy consumption and cost.
It will be shown how to tune processors in various ways.
Furthermore this course looks into the required design trajectory,
concentrating on code generation, scheduling, and on efficient data
management (exploiting the advanced memory hierarchy) for high
performance and low power. The student will learn how to apply a
methodology for a step-wise (source code) transformation and mapping
trajectory, going from an initial specification to an efficient and
highly
tuned implementation on a particular platform. The final implementation
can be an order of magnitude more efficient in terms of cost, power,
and performance.
Topics:
In this course we treat different processor architectures: DSP (digital
signal processors), VLIWs (very long instruction word, including
Transport Triggered Architectures), ASIPs (application specific
processors), and highly tuned, weakly programmable processors. In all
cases it is shown how to program these architectures. Code generation
techniques, especially for VLIWs, are treated, including methods to
optimize code at source or assembly level. Furthermore the design of
advanced data and instruction memory hierarchies will be detailed. A
methodology is discussed for the efficient use of the data memory
hierarchy.
Most of the topics will be supplemented by hands-on exercises.
For more information on course and lecture schedule see:
course description
The lecture slides will be made available during the course; see also
below.
Papers and other reading material
- Download the 5P520 lecture material on Jef's page , click
Education and then 5P520 (embedded multi-media
systems).
Check especially chapters 1-6 (download the pdf files); chapters 3-6 you have to learn!
- Learn Chapter 2 on Computer Architecture Trends
From "Microprocessor Architectures, from VLIW to TTA" by Henk
Corporaal, publisher John Wiley, 1998.
- A paper about data reuse.
Formalized methodology for data reuse exploration in hierarchical
memory mapping.
J.Ph.Diguet e.a.
- Code transformations.
Code transformations for data transfer and storage exploration
preprocessing multimedia processors.
Francky Catthoor, Nikil D. Dutt, Koen Danckaert and Sven Wuytack
IEEE Design and Test of Computers, May-June 2001
- Data storage components.
Random-access data storage components in customized architectures
Lode Nachtergaele, Francky Catthoor and Chidamber Kulkarni
IEEE Design and Test of Computers, May-June 2001
- Data optimizations.
Data memory organization and optimizations in Application Specific
systems
P.R. Panda e.a.
IEEE Design and Test of Computers, May-June 2001
Slides (per topic; see also the course description)
Slides as far as available. Slides of lectures 8 and 9 are not online.
- Introduction, including course
schedule
- Lecture 1: Programmable CPU cores
- Lecture 2: DSPs
- Lecture 3: Instruction level parallelism.
Part 1: VLIW architectures
- Lecture 4: Instruction level parallelism.
Part 2: Code generation
- Lecture 5: Application domain specific
processors (ADSPs or ASIPs)
- Lecture 6: Exploiting Data Level
Parallelism: SIMD architectures
- Lecture 7: Data Memory Management
(DMM): part a
- Lecture 8: Guest lecture 1 on:
Energy Scavenging, or Micropower Generation for Wireless Autonomous
Sensors, by Ruud Vullers, IMEC-NL
- Lecture 9: Guest lecture on Wireless Sensor Networks, by Guy Meynants from IMEC-NL
- smart sensors
- image sensors
- power converters
- Lecture 10: Data Memory
Management (DMM): part b
Hands-on lab work
Will be updated during the course!
During the course there are three lab exercises to be made (so called
hands-on); see the detailed course
description, and also the links below. They will be explained at the
corresonding lectures. The final two hands-on are more substantial; one
about Design Space Exploration (i.e. to fit your processor to a certain
application or application domain), the other about DTSE (data
memory management). You get several weeks to solve these lab exercises.
We will use three C-applications during the hands-ons; code will be
made available.
- 16-taps FIR filter
- 2-taps BiQuad IIR filter
- YUV-to-RGB conversion
Hands-on 1: Mapping to a programmable core
- Home page of the SPIM simulator
(for MIPS R2000/R3000 architectures). You need this simulator for your
first lab exercise on MIPS assembly programming.
- Study the following example: GCD
(Greatest Common Divisor), and test the
assembler code using
the SPIM simulator.
Hands-on 2: Mapping to a VLIW type ASIC
In this excercise we explore the Imagine processor from Stanford
University; see the imagine
website for details about this processor.
Imagine is a streaming oriented processor. It contains in its basic
realization 8 PEs (processing elements) acting in SIMD mode (i.e. each
PE executes the same instruction from an instruction controler). The
PEs themselves are VLIW type of processors, capable of performing
multiple operations per cycle.
For this excercise
Hands-on 3: Exploiting the data memory hierarchy for high
performance and low power
In this exercise you are asked to optimize a C algorithm by using the
discussed data management techniques. This should result into an
implementation which shows a much improved memory behavior. This
improves performance and energy consumption. In this exercise we mainly
concentrate on reducing energy consumption. You need to download the
following, and follow the instructions:
Examination
The examination will be oral about the treated course theory, the IDCT
hands-on, and the
DTSE hands-on.
Date: ** to be decided **.
Related material and other links
Interesting processor architectures:
- The cell
architecture, made by Sony, IBM and Toshiba, and used e.g. in
Playstation 3
- TRIPS architecture,
combining several types of parallelism
- The tile based RAW
architecture from MIT
- Imagine,
a hybrid SIMD - VLIW architecture from Stanford
- Merrimac, the
successor of the Imagine
- ChipCon, check e.g. their system-on-chip: CC1110
- MAXQ
from MAXIM, Dallas; a Transport Triggered Architecture
- Aethereal,
a Network-on-Chip from Philips
Back to homepage of Henk Corporaal