Processor Architectures and Program Mapping

2006 - 2007

Code : 5KK10
Lecturers : Prof. Jef van Meerbergen, Dr. Bart Mesman, Prof. Henk Corporaal
Tel. : +31-40-247 5195 / 3653 (secr.) 5462 (office)
Email:  J.v.Meerbergen at; B.Mesman at; H.Corporaal at
Project assistance: Hamed Fatemi (H.Fatemi at


Information on the course:


When looking at future embedded systems and their design, especially (but not exclusively) in the multi-media domain, we observe several problems: In order to solve these problems we foresee the use of programmable multi-processor platforms, having an advanced memory hierarchy, this together with an advanced design trajectory. These platforms may contain different processors, ranging from general purpose processors, to processors which are highly tuned for a specific application or application domain. This course treats several processor architectures, shows how to program and generate (compile) code for them, and compares their efficiency in terms of cost, power and performance. Furthermore the tuning of processor architectures is treated.

This course aims at getting an understanding of the processor architectures which will be used in future multi-processor platforms, including their memory hierarchy. Treated processors range from general purpose to highly optimized ones. Tradeoffs will be made between performance, flexibility, programmability, energy consumption and cost. It will be shown how to tune processors in various ways. Furthermore this course looks into the required design trajectory, concentrating on code generation, scheduling, and on efficient data management (exploiting the advanced memory hierarchy) for high performance and low power. The student will learn how to apply a methodology for a step-wise (source code) transformation and mapping trajectory, going from an initial specification to an efficient and highly tuned implementation on a particular platform. The final implementation can be an order of magnitude more efficient in terms of cost, power, and performance.


In this course we treat different processor architectures: DSP (digital signal processors), VLIWs (very long instruction word, including Transport Triggered Architectures), ASIPs (application specific processors), and highly tuned, weakly programmable processors. In all cases it is shown how to program these architectures. Code generation techniques, especially for VLIWs, are treated, including methods to optimize code at source or assembly level. Furthermore the design of advanced data and instruction memory hierarchies will be detailed. A methodology is discussed for the efficient use of the data memory hierarchy.
Most of the topics will be supplemented by hands-on exercises.
For more information on course and lecture schedule see: course description


The lecture slides will be made available during the course; see also below.
Papers and other reading material

Slides (per topic; see also the course description)

Slides as far as available. Slides of lectures 8 and 9 are not online.

Hands-on lab work

Will be updated during the course!

During the course there are three lab exercises to be made (so called hands-on); see the detailed course description, and also the links below. They will be explained at the corresonding lectures. The final two hands-on are more substantial; one about Design Space Exploration (i.e. to fit your processor to a certain application or application domain), the other about DTSE (data memory management). You get several weeks to solve these lab exercises.
We will use three C-applications during the hands-ons; code will be made available.

Hands-on 1: Mapping to a programmable core

Hands-on 2: Mapping to a VLIW type ASIC

In this excercise we explore the Imagine processor from Stanford University; see the imagine website for details about this processor.
Imagine is a streaming oriented processor. It contains in its basic realization 8 PEs (processing elements) acting in SIMD mode (i.e. each PE executes the same instruction from an instruction controler). The PEs themselves are VLIW type of processors, capable of performing multiple operations per cycle.
For this excercise

Hands-on 3: Exploiting the data memory hierarchy for high performance and low power

In this exercise you are asked to optimize a C algorithm by using the discussed data management techniques. This should result into an implementation which shows a much improved memory behavior. This improves performance and energy consumption. In this exercise we mainly concentrate on reducing energy consumption. You need to download the following, and follow the instructions:


The examination will be oral about the treated course theory, the IDCT hands-on, and the DTSE hands-on.
Date: ** to be decided **.

Related material and other links

Interesting processor architectures:

Back to homepage of Henk Corporaal