Processor Architectures and Program Mapping

2006 - 2007

Code : 5KK10
Lecturers : Prof. Jef van Meerbergen, Dr. Bart Mesman, Prof. Henk Corporaal
Tel. : +31-40-247 5195 / 3653 (secr.) 5462 (office)
Email: J.v.Meerbergen at tue.nl; B.Mesman at tue.nl; H.Corporaal at tue.nl
Project assistance: Hamed Fatemi (H.Fatemi at tue.nl)

UPDATES

14 Feb 2007: Updated the final lab exercise
13 Feb 2007: Slides for final lecture 10 added
23 Feb 2007: Imagine lab exercise (Hands-on 2) has been updated
23 Feb 2007: slides of lecture 9 has been added
21 Feb 2007: slides up to lecture 8 have been added; note lecture 8 has 3 slide-sets
11 Jan 2007: slides up to lecture 5 have been added.
11 Jan 2007: first part of handson 2 has been added. The first part you should finish on Feb 8.

Description

When looking at future embedded systems and their design, especially (but not exclusively) in the multi-media domain, we observe several problems:

high performace (10 GOPS and beyond) has to be combined with low power (many systems are mobile);
time-to-market (to get your design done) constantly reduces;
most embedded processing systems have to be extremely low cost;
the applications show more dynamic behavior (resulting in greatly varying quality and performance requirements);
more and more the implementer requires flexible and programmable solutions;
huge latencie gap between processors and memories; and
design productivity does not cope with the increasing design complexity.

In order to solve these problems we foresee the use of programmable multi-processor platforms, having an advanced memory hierarchy, this together with an advanced design trajectory. These platforms may contain different processors, ranging from general purpose processors, to processors which are highly tuned for a specific application or application domain. This course treats several processor architectures, shows how to program and generate (compile) code for them, and compares their efficiency in terms of cost, power and performance. Furthermore the tuning of processor architectures is treated.

Purpose:
This course aims at getting an understanding of the processor architectures which will be used in future multi-processor platforms, including their memory hierarchy. Treated processors range from general purpose to highly optimized ones. Tradeoffs will be made between performance, flexibility, programmability, energy consumption and cost. It will be shown how to tune processors in various ways. Furthermore this course looks into the required design trajectory, concentrating on code generation, scheduling, and on efficient data management (exploiting the advanced memory hierarchy) for high performance and low power. The student will learn how to apply a methodology for a step-wise (source code) transformation and mapping trajectory, going from an initial specification to an efficient and highly tuned implementation on a particular platform. The final implementation can be an order of magnitude more efficient in terms of cost, power, and performance.

Topics:

In this course we treat different processor architectures: DSP (digital signal processors), VLIWs (very long instruction word, including Transport Triggered Architectures), ASIPs (application specific processors), and highly tuned, weakly programmable processors. In all cases it is shown how to program these architectures. Code generation techniques, especially for VLIWs, are treated, including methods to optimize code at source or assembly level. Furthermore the design of advanced data and instruction memory hierarchies will be detailed. A methodology is discussed for the efficient use of the data memory hierarchy.
Most of the topics will be supplemented by hands-on exercises.
For more information on course and lecture schedule see: course description

Handouts

The lecture slides will be made available during the course; see also below.
Papers and other reading material

Download the 5P520 lecture material on Jef's page , click Education and then 5P520 (embedded multi-media systems). Check especially chapters 1-6 (download the pdf files); chapters 3-6 you have to learn!
Learn Chapter 2 on Computer Architecture Trends
From "Microprocessor Architectures, from VLIW to TTA" by Henk Corporaal, publisher John Wiley, 1998.
A paper about data reuse. Formalized methodology for data reuse exploration in hierarchical memory mapping.
J.Ph.Diguet e.a.
Code transformations. Code transformations for data transfer and storage exploration preprocessing multimedia processors.
Francky Catthoor, Nikil D. Dutt, Koen Danckaert and Sven Wuytack
IEEE Design and Test of Computers, May-June 2001
Data storage components. Random-access data storage components in customized architectures
Lode Nachtergaele, Francky Catthoor and Chidamber Kulkarni
IEEE Design and Test of Computers, May-June 2001
Data optimizations. Data memory organization and optimizations in Application Specific systems
P.R. Panda e.a.
IEEE Design and Test of Computers, May-June 2001

Slides (per topic; see also the course description)

Slides as far as available. Slides of lectures 8 and 9 are not online.

Introduction, including course schedule
Lecture 1: Programmable CPU cores
Lecture 2: DSPs
Lecture 3: Instruction level parallelism. Part 1: VLIW architectures
Lecture 4: Instruction level parallelism. Part 2: Code generation
Lecture 5: Application domain specific processors (ADSPs or ASIPs)
Lecture 6: Exploiting Data Level Parallelism: SIMD architectures
Lecture 7: Data Memory Management (DMM): part a
Lecture 8: Guest lecture 1 on: Energy Scavenging, or Micropower Generation for Wireless Autonomous Sensors, by Ruud Vullers, IMEC-NL
Lecture 9: Guest lecture on Wireless Sensor Networks, by Guy Meynants from IMEC-NL

smart sensors
image sensors
power converters

Lecture 10: Data Memory Management (DMM): part b

Loop transformation overview

Hands-on lab work

Will be updated during the course!

During the course there are three lab exercises to be made (so called hands-on); see the detailed course description, and also the links below. They will be explained at the corresonding lectures. The final two hands-on are more substantial; one about Design Space Exploration (i.e. to fit your processor to a certain application or application domain), the other about DTSE (data memory management). You get several weeks to solve these lab exercises.
We will use three C-applications during the hands-ons; code will be made available.

16-taps FIR filter
2-taps BiQuad IIR filter
YUV-to-RGB conversion

Hands-on 1: Mapping to a programmable core

Home page of the SPIM simulator (for MIPS R2000/R3000 architectures). You need this simulator for your first lab exercise on MIPS assembly programming.
Study the following example: GCD (Greatest Common Divisor), and test the assembler code using the SPIM simulator.

Hands-on 2: Mapping to a VLIW type ASIC

In this excercise we explore the Imagine processor from Stanford University; see the imagine website for details about this processor.
Imagine is a streaming oriented processor. It contains in its basic realization 8 PEs (processing elements) acting in SIMD mode (i.e. each PE executes the same instruction from an instruction controler). The PEs themselves are VLIW type of processors, capable of performing multiple operations per cycle.
For this excercise

Check the link http://www.ics.ele.tue.nl/~hfatemi/5kk10/ First try out the tools, then design and test your own streaming program for this processor.
See this page for extra information.

Hands-on 3: Exploiting the data memory hierarchy for high performance and low power

In this exercise you are asked to optimize a C algorithm by using the discussed data management techniques. This should result into an implementation which shows a much improved memory behavior. This improves performance and energy consumption. In this exercise we mainly concentrate on reducing energy consumption. You need to download the following, and follow the instructions:

Guidelines. This describes stepwise what you should do.
The algorithm and other required files.

Examination

The examination will be oral about the treated course theory, the IDCT hands-on, and the DTSE hands-on.
Date: ** to be decided **.