Embedded Computer Architecture

2016 - 2017 (Second Quartile)

Code : 5SIA0
: 5 ECTS
Lecturers : Prof. dr. Henk Corporaal
Tel. : +31-40-247 5195 (secr.) 5462 (office)
Email:  H.Corporaal at tue.nl
Project assistance:
- Luc Waeijen (L.J.W.Waeijen at tue.nl),
- Mark Wijtvliet (M.Wijtvliet at tue.nl),
- Mohammad Tahghighi (M.Tahghighi at tue.nl), and
- Roel Jordans (R.Jordans at tue.nl)


Information on the course:


When looking at future embedded systems and their design, especially (but not exclusively) in the multi-media domain, we observe several problems: In order to solve these problems we foresee the use of programmable multi-processor platforms, having an advanced memory hierarchy, this together with an advanced design trajectory. These platforms may contain different processors, ranging from general purpose processors, to processors which are highly tuned for a specific application or application domain. This course treats several processor architectures, shows how to program  them, and compares their efficiency in terms of cost, power and performance. Furthermore the tuning of processor architectures is treated. 

Several advanced Multi-Processor Platforms, combining discussed processors, are treated. A set of mandatory very advanced lab exercises complements the course.

This course aims at getting an understanding of the processor architectures which will be used in future multi-processor platforms, including their memory hierarchy, especially for the embedded domain. Treated processors range from general purpose to highly optimized ones. Trade-offs will be made between performance, flexibility, programmability, energy consumption and cost. It will be shown how to tune processors in various ways.

Studying the architecture, organization and use of the newest (micro)processors currently on the market, and the latest research developments in computer architecture. Architectures exploiting instruction-level parallelism (ILP), data-level parallelism (DLP), thread-level and task-level parallelism are treated. Starting from basic architecture concepts we will end with discussing the latest commercial processors.

This course also treats how processors can be combined in a multiprocessing platform, e.g. by using a Network-on-Chip. Inter-processor communication issues will be dealt with. Furthermore some code generation techniques needed for exploiting ILP will be treated (Note, code generation will be far more extensively treated in a special course on Parallelization, Compilers and Platforms; 5LIM0).

Special emphasis will be on quantifying design decisions in terms of energy, performance and cost. The intention of the course is to give students the ability to understand the design principles and operation of new (multi-)processor architectures, and evaluate them both qualitatively and quantitatively. Although we treat several examples, the emphasis will be on architecture concepts. Furthermore, 3 intensive lab exercises are part of course; they will learn you the design space of multi- and graphics processors.
We will invite several guest lectures treating State-of-the-Art topics in the computer architecture area.


Book and Handouts

The lecture slides will be made available during the course; see also below.
Papers and other reading material.

Book: PUsed course bookarallel Computer Organization and Design 
Authors: Michel Dubois, Murali Annavaram, and Per Stenström

Cambridge university press
October 2012

In addition: Learn Chapter 2
on Computer Architecture Trends
taken from the book "Microprocessor Architectures, from VLIW to TTA"
          by Henk Corporaal, publisher John Wiley, 1998.

Slides (per topic; see also the course description)

** Slides will be updated regularly during the course.

Slides corresponding to labs

Hands-on lab work

Becoming a very good Computer Architect you have to practice a lot. Therefore, as part of this course we have put a lot of effort in preparing 3 very interesting lab assignments. For each lab there is a website with all the required documentation and preparation material. These lab assignments can be made (largely, but check the wiki pages) on your own laptop, with for certain parts, remote access to our server systems.
For every lab you have to write a report, which has to be sent to one of the course assistants, or via the oncourse web site (always check the lab specific instructions).

Lab 1: Processor Design Space Exploration, based on the CGRA from TU/e

You will be asked to design and optimize a low power and very flexible CGRA, Coarse Grain Reconfigurable Array processor for a certain application.  In particular you have to trade-off performance and energy consumption.
Details and instructions about this assignment can be found here (link to oncourse.tue.nl website, you need to log in)

Lab 2: Programming Graphic Processing Units

Graphic processing units (GPUs) can contain upto thousands of Processing Engines (PEs). They achieve performance levels of Tera FLops (10^12 floating point operations per second). In the past GPUs were very dedicated, not general programmable, and  could only be used to speedup graphics processing. Today, they become more-and-more general purpose and even appear in high end embedded systems. The latest GPUs of ATI/AMD and NVIDIA can be programmed in Cuda and/or OpenCL. For this lab we will use  GPUs together with the OpenCL (based on C) programming environment.

After studying the example and learning material you have to perform your own assignment and hand in a small report. The assignment this year is about generating money, coins, called Coinporaals. Generate as many as possible by making your program extremely efficient.

All the details about this assignment can be found on the GPU-assignment site, or check our oncourse.tue.nl website.

Lab 3: Designing and Programming Multi-Processor systems

The state-of-the-art CPUs contain dozens of cores on a single die. The trend of going multi-core posts new challenges to both computer architects and programmers. In this assignment, we will try to tackle these challenges, from the view point of both computer architects and programmers. The purpose of this assignment is to:

  1. Get an in-depth understanding of mainstream multi-core CPU architectures.
  2. Learn how to develop parallel application on such architectures, and how to analyze the performance in a real environment.
In this lab assignment, you will be asked to map a C program onto a multiprocessor system. With the help of the Gem5  simulator, we will look at different configurations, e.g., the number of processors,  block-size and associativity of different levels of caches. The goal is to optimize the performance of the system. You can achieve this goal by improving the original C code, using pthreads and any other creative methods.
Details about the assignment can be found on our oncourse.tue.nl website.

Student presentations guidelines

As part of this lecture you have to study a hot topic related to this course, and make a short slide presentation about this topic.
The slides have to be presented during the oral exam.

Guidelines are as follows:


The examination will be oral and/or online (using your laptop), about the treated course theory (all treated slides + corresponding parts of the used book), the lab report(s), and studied article.
When: end of January, early February 2016. We will discuss the dates with you.
Grading depends on your results on theory, lab exercises and your presentation.

Related material and other links

Interesting processor architectures:

Back to homepage of Henk Corporaal