Platform-based Design

2007 - 2008 (1st semester) mica2dot

Code : 5KK70
Credits: 5 ECTS
Lecturers : Dr. Bart Mesman, Prof. dr. Henk Corporaal
Tel. : +31-40-247 5195 / 3653 (secr.) 5462 (office)
Email: B.Mesman at tue.nl; H.Corporaal at tue.nl
Project assistance: Hamed Fatemi (H.Fatemi at tue.nl), Akash Kumar (A.Kumarat tue.nl)

News

10 Januari 2008: added student presentations
20 December: added the final lab assignement 3 on Data Memory Management
19 December: updated the cell assignment
5 December: updated both the WiCa and the CELL assignments.
1 December: WiCa board assignment has been added. This is a second option for lab assignment 2.
21 November: guidelines for your topic presentation have been added;
19 November: First part of Lab assignment 2 has been added: about programming the CELL architecture as used in the Sony Playstation 3.
We are still working on an alternative for CELL, using the Wica board.

Description

When looking at future embedded systems and their design, especially (but not exclusively) in the multi-media domain, we observe several problems:

high performace (10 GOPS and beyond) has to be combined with low power (many systems are mobile);
time-to-market (to get your design done) constantly reduces;
most embedded processing systems have to be extremely low cost;
the applications show more dynamic behavior (resulting in greatly varying quality and performance requirements);
more and more the implementer requires flexible and programmable solutions;
huge latencie gap between processors and memories; and
design productivity does not cope with the increasing design complexity.

In order to solve these problems we foresee the use of programmable multi-processor platforms, having an advanced memory hierarchy, this together with an advanced design trajectory. These platforms may contain different processors, ranging from general purpose processors, to processors which are highly tuned for a specific application or application domain. This course treats several processor architectures, shows how to program and generate (compile) code for them, and compares their efficiency in terms of cost, power and performance. Furthermore the tuning of processor architectures is treated.

Several advanced Multi-Processor Platforms, combining discussed processors, are treated. A set of lab exercises complements the course.

Purpose:
This course aims at getting an understanding of the processor architectures which will be used in future multi-processor platforms, including their memory hierarchy, especially for the embedded domain. Treated processors range from general purpose to highly optimized ones. Tradeoffs will be made between performance, flexibility, programmability, energy consumption and cost. It will be shown how to tune processors in various ways.

Furthermore this course looks into the required design trajectory, concentrating on code generation, scheduling, and on efficient data management (exploiting the advanced memory hierarchy) for high performance and low power. The student will learn how to apply a methodology for a step-wise (source code) transformation and mapping trajectory, going from an initial specification to an efficient and highly tuned implementation on a particular platform. The final implementation can be an order of magnitude more efficient in terms of cost, power, and performance.

Topics:

In this course we treat different processor architectures: DSP (digital signal processors), VLIWs (very long instruction word, including Transport Triggered Architectures), ASIPs (application specific processors), and highly tuned, weakly programmable processors. In all cases it is shown how to program these architectures. Code generation techniques, especially for VLIWs, are treated, including methods to optimize code at source or assembly level. Furthermore the design of advanced data and instruction memory hierarchies will be detailed. A methodology is discussed for the efficient use of the data memory hierarchy.
Most of the topics will be supplemented by hands-on exercises.
For more information on course and lecture schedule see: course description

Handouts

The lecture slides will be made available during the course; see also below.
Papers and other reading material

Download the 5P520 lecture material on Jef's page , click Education and then 5P520 (embedded multi-media systems). Check especially chapters 1-6 (download the pdf files);
Learn Chapter 2 on Computer Architecture Trends
From "Microprocessor Architectures, from VLIW to TTA" by Henk Corporaal, publisher John Wiley, 1998.
A paper about data reuse. Formalized methodology for data reuse exploration in hierarchical memory mapping.
J.Ph.Diguet e.a.
Code transformations. Code transformations for data transfer and storage exploration preprocessing multimedia processors.
Francky Catthoor, Nikil D. Dutt, Koen Danckaert and Sven Wuytack
IEEE Design and Test of Computers, May-June 2001
Data storage components. Random-access data storage components in customized architectures
Lode Nachtergaele, Francky Catthoor and Chidamber Kulkarni
IEEE Design and Test of Computers, May-June 2001
Data optimizations. Data memory organization and optimizations in Application Specific systems
P.R. Panda e.a.
IEEE Design and Test of Computers, May-June 2001

Slides (per topic; see also the course description)

Slides as far as available (will be updated regularly during the course).

Lecture 1: Introduction + Programmable CPU / RISC cores
Lecture 2: VLIW architectures (part a)
Lecture 3: VLIW architectures (part b) + ILP compilation (part a)
Lecture 4: ILP compilation (part b)
Lecture 5: SIMD
Lecture 6: ASIP
Lecture 7: MPSoC (part a)
Lecture 8:

Lecture 9: Massive parallel SoCs; Scheduling; Resource Management
Lecture 10: Smart Camera Networks:
Guest lecture by Prof. dr. Hamid Aghajan from Stanford University and NXP
Lecture 11: Data Memory Management (DMM): part a
Lecture 12: Data Memory Management (DMM): part b
Lecture 13: Data Memory Management (DMM): part c

Loop transformation overview

Lecture 14+15: Student presentations
See below for details.

Student presentations guidelines

The last two lectures will be used to let you present a topic highly related to this course.
Guidelines are as follows:

Choose a hot topic which interests you and which is highly related to this course.
Select a technical research paper from the web, based on this topic.
The paper should have sufficient technical depth; i.e. it should clearly explain all the details of the proposed method or solution. You can also check whether the paper is from well perceived journals or conferences, like IEEE, or ACM conferences and journals (see e.g. IEEE.org, and ACM.org).
The paper should be published in the last 5 years.
You should make a powerpoint presentation on your topic.
The presentation should contain at least the following:

Summary of the paper contirbution (including technical details)
Your evaluation of this paper

strong points
weak points
applicability of proposed methodology / solution
indicate new / future directions of research

In order to evaluate the paper you may have to read related material on the same topic.
You are expected to prepare this in groups of 2 students (so you work together with one other student, as far as possible). Each student should present part of the presentation.
You get 15 minutes per presentation, including questions. So present not to many, and very clear and structured slides.
Your presentation will be evaluated by us. This evaluation will be taking into account for the final grading.
Send us (before your presentation) a copy of your slides and of the discussed paper. Include the bibliography info of this paper (authors, title, journal / conference, date, pages) such that we can make a website on this.

Hands-on lab work

Will be updated during the course!

During the course there are three lab exercises to be made (so called hands-on); see also the links below. They will be explained at the corresonding lectures.

Hands-on 1: Processor Design Space Exploratoin, based on the Imagine architecture

In this excercise we explore the Imagine processor from Stanford University; see the imagine website for details about this processor.
Imagine is a streaming oriented processor. It contains in its basic realization 8 PEs (processing elements) acting in SIMD mode (i.e. each PE executes the same instruction from an instruction controler). The PEs themselves are VLIW type of processors, capable of performing multiple operations per cycle.
For this excercise

Check the link http://www.ics.ele.tue.nl/~hfatemi/5JJ70/
First try out the tools, then design and test your own streaming program for this processor.
See this page for extra information, more instructions, and exam questions.

Hands-on 2: Platform Programming

In this lab you are asked to program a multi-processor platform.
There are two options: using the CELL platform or using the Wica platform. The first assignment is ready, we are
still working on the second one. So if you want to start today, choose the CELL option.

a. Programming the CELL Broadband Engine

The CELL contains a PowerPC processor and 8 SPEs (Synergetic Processing Engines) of which you can use 6 (number 7 is used by the operating system and number 8 is not guaranteed to be functional). The CELL processor is part of the Sony Playstation 3, which we will use as target. But also a good simulator and compiler environment is available.
All details about the architecture, the simulation and compiler environment, and example programs can be found at the CELL-assignemnt page. Read this page carefully and follow the instructions.

b. Programming the WiCa 1.1 board

The WiCa 1.1 board is developed by Philips and NXP. The board is meant for being used in Smart Camera's. It contains among others the Xetal SIMD image processing chip, containing 320 Processing Elements, and an 8051 microcontroler. To observe the world it contains two image sensors; this allows even for stereo vision and depth calculation.
To connect to your PC it has an USB interface, but you can also attach a ZigBee low power interface to make a smart wireless sensor network.
The assignment is on using the image sensors to detect simple objects, their movement, and if possible, cooperate with other boards.
All details about the WiCa platform, the simulation and compiler environment, and example programs can be found at the WiCa-assignemnt page. Read this page carefully and follow the instructions.

Hands-on 3: Exploiting the data memory hierarchy for high performance and low power

In this exercise you are asked to optimize a C algorithm by using the discussed data management techniques. This should result into an implementation which shows a much improved memory behavior. This improves performance and energy consumption. In this exercise we mainly concentrate on reducing energy consumption. You need to download the following, and follow the instructions:

Guidelines. This describes stepwise what you should do.
The algorithm and other required files.

Examination

The examination will be oral about the treated course theory, and the lab report(s).
Date: ** to be decided **.