Embedded Computer Architecture

2015 - 2016 (Second Quartile)

Code : 5SIA0
Credits: 5 ECTS
Lecturers : Prof. dr. Henk Corporaal
Tel. : +31-40-247 5195 (secr.) 5462 (office)
Email: H.Corporaal at tue.nl
Project assistance: Luc Waeijen (L.J.W.Waeijen at tue.nl), Mark Wijtvliet (M.Wijtvliet at tue.nl), and Roel Jordans (R.Jordans at tue.nl)

News

Oral exams: on Jan 29, Feb 1, 2 and 4
Jan 10: updated/uploaded slides of chapter 6,7 and 8
Jan 4: uploaded new lectures on chapter 4 and 5
GPU guest lecture Monday 14 December; slides are online.
Dec 11: lab 3 is online; have a look.
Lab assignment 2 is already online !
Updated schedule
Slides Ch 3 part 1 added
Nov 16: Lab 1 on Multi Processor design has started; see the lab site.
Slides of chapter 1 and 2 have been added
November 9, 2015, start of new lecture in IPO 0.98 (Mondays 3,4) / Pav B2 (changed: was originally U46; Thursdays 7,8)
Interesting article about processor design a 90 minute guide!

Information on the course:

Schedule 2015-2016
Description and objectives
Book and handouts
Topics
Slides
Hands-on exercises
Examination information
Related and other links

Description

When looking at future embedded systems and their design, especially (but not exclusively) in the multi-media domain, we observe several problems:

high performace (1 TOPS and beyond) has to be combined with low power (many systems are mobile);
time-to-market (to get your design done) constantly reduces;
most embedded processing systems have to be extremely low cost;
the applications show more dynamic behavior (resulting in greatly varying quality and performance requirements);
more and more the implementer requires flexible and programmable solutions;
huge latencie gap between processors and memories; and
design productivity does not cope with the increasing design complexity.

In order to solve these problems we foresee the use of programmable multi-processor platforms, having an advanced memory hierarchy, this together with an advanced design trajectory. These platforms may contain different processors, ranging from general purpose processors, to processors which are highly tuned for a specific application or application domain. This course treats several processor architectures, shows how to program them, and compares their efficiency in terms of cost, power and performance. Furthermore the tuning of processor architectures is treated.

Several advanced Multi-Processor Platforms, combining discussed processors, are treated. A set of lab exercises complements the course.

Purpose:
This course aims at getting an understanding of the processor architectures which will be used in future multi-processor platforms, including their memory hierarchy, especially for the embedded domain. Treated processors range from general purpose to highly optimized ones. Tradeoffs will be made between performance, flexibility, programmability, energy consumption and cost. It will be shown how to tune processors in various ways.

Studying the architecture, organization and use of the newest (micro)processors currently on the market, and the latest research developments in computer architecture. Architectures exploiting instruction-level parallelism (ILP), data-level parallelism (DLP), thread-level and task-level parallelism are treated. Starting from basic architecture concepts we will end with discussing the latest commercial processors.

This course also treats how processors can be combined in a multiprocessing platform, e.g. by using a Network-on-Chip. Interprocessor communication issues will be dealt with. Furthermore new code generation techniques needed for exploiting ILP will be treated. Special emphasis will be on quantifying design decisions in terms of performance and cost. The intention of the course is to give students the ability to understand the design principles and operation of new (multi-)processor architectures, and evaluate them both qualitatively and quantitatively. Although we treat several examples, the emphasis will be on architecture concepts. Furthermore, 3 intensive lab exercises are part of course; they will learn you the design space of multi- and graphics processors.

Topics:

Basic principles (like instruction set design), pipelining and its consequences; VLIW (very long instruction word, including TTAs, Transport Triggered Architectures), Superpipelined, Superscalar, SIMD (single instruction, multiple data, used in vector and sub-wordparallel processors) and MIMD (multiple instruction, multiple data) architectures; SMT (Simultaneous Multi-Threading); Out-of-order and speculative execution; Branch prediction; Data (value) prediction; Design of advanced memory hierarchies; Memory coherency and consistency; Multi-threading; Exploiting task-level and instruction-level parallelism; Inter-processor communication models; Input and output; Network Communication Architecture; and Networks-on-Chip.
In all cases it is shown how to program these architectures. Furthermore their combination and interconnection in an MPSoCs (Multi-Processor Systems-on-Chips) is treated.

Most of the topics will be supplemented by very elaborate hands-on exercises.

Book and Handouts

The lecture slides will be made available during the course; see also below.
Papers and other reading material

Book: P Used course book arallel Computer Organization and Design
Authors: Michel Dubois, Murali Annavaram, and Per Stenström

Cambridge university press
October 2012

Learn Chapter 2 on Computer Architecture Trends
From the book "Microprocessor Architectures, from VLIW to TTA"
by Henk Corporaal, publisher John Wiley, 1998.

Slides (per topic; see also the course description)

** Slides as far as available. Slides will be updated regularly during the course.

Overview of this lecture, including preliminary Course Schedule
Chapter 1: Computer Systems Overview
Chapter 2: Technology
Chapter 9: Simulation
Chapter 3 part 1: Micro Processor Architecture

Includes the MIPS ISA (Instruction Set Architecture) and MIPS pipelined implementation

Chapter 3 part 2: Out-of-Order architectures, Superscalar, Branch Prediction, Speculative executions and limits to Instruction-Level Parallelism (ILP limits)
Accelerators

Guest lecture by Mark Wijtvliet

Overview of INTEL, SiliconHive VLIW processor and its tool flow (Compiler, TIM, etc.)

Guest lecture by Menno Lindwer

GPU: Graphic Processing Unit. How does it operate? And how to program it?

Guest lecture by Zhenyu Ye

Chapter 4: Memory hierarchy; caches, virtual memory management
Chapter 5: Multi-Processing Systems
Chapter 6: Interconnection Networks
Chapter 7: Coherence, Synchronization and Memory Consistency
Chapter 8.3: Core Multi-Threading: Simultaneous Multi-Threading
Neural Architectures

Guest lecture by Maurice Peemen

Slides corresponding to labs

Lab 1 on Multi-Core programming and Design Space Exploration
Using SniperSim (Luc Waeijen; see also below and the mentioned Wiki site)

Introduction
Details
OpenMP programming (slides by Maurice Peemen)

See further the wiki sites for the lab information (see below)

Hands-on lab work

Becoming a very good Computer Architect you have to practice a lot. Therefore, as part of this course we have put a lot of effort to prepare 3 very interesting lab assignments. For each lab there is a website with all the required documentation and preparation material. These lab assignments can be made on your own laptop, with for certain parts, remote access to our server systems.
For every lab you have to write a report, which has to be sent to one of the course assistants.
** labs will updated and be put online during the course **

Lab 1: Designing and Programming Multi-Processor systems

The state-of-the-art CPUs contain dozens of cores on a single die. The trend of going multi-core posts new challenges to both computer architects and programmers. In this assignment, we will try to tackle these challenges, from the view point of both computer architects and programmers. The purpose of this assignment is to

Get an in-depth understanding of mainstream multi-core CPU architectures.
Learn how to develop parallel application on such architectures, and how to analyze the performance in a real environment.

In this lab assignment, you will be asked to map a C program onto a multiprocessor system. With the help of SniperSim simulator, we will look at different configurations, e.g., the number of processors, block-size and associativity of different levels of caches. The goal is to optimize the Energy-Delay-Area-Product (EDAP) of the system. You can achieve this goal by improving the original C code, using OpenMP, and/or using any other creative methods. Details about the assignment can be found here.

Lab 2: Processor Design Space Exploration, based on the Silicon Hive Architecture from INTEL

You will be asked to design and optimize a low power VLIW processor for the ECG application. In particular you have to trade-off performance and energy consumption.
Details and instructions about this assignment can be found here.

Lab 3: Programming Graphic Processing Units

Graphic processing units (GPUs) can contain upto thousands of Processing Engines (PEs). They achieve performance levels of Tera FLops (10^12 floating point operations per second). In the past GPUs were very dedicated, not general programmable, and could only be used to speedup graphics processing. Today, they become more-and-more general purpose and even appear in high end embedded systems. The latest GPUs of ATI/AMD and NVIDIA can be programmed in Cuda and/or OpenCL. For this lab we will use GPUs together with the OpenCL (based on C) programming environment.

After studying the example and learning material you have to perform your own assignment and hand in a small report. The assignment this year is about generating money, coins, called Coinporaals. Generate as many as possible by making your program extremely efficient.

All the details about this assignment can be found on the GPU-assignment site.

Student presentations guidelines

As part of this lecture you have to study a hot topic related to this course, and make a short slide presentation about this topic.
The slides have to be presented during the oral exam.

Guidelines are as follows:

Choose one hot topic which interests you and which is highly related to this course.
Select one technical (in depth) research papers from the web, based on this topic.
The paper should have sufficient technical depth; i.e. it should clearly explain all the details of the proposed method or solution. So e.g. do not choose company white or business papers. You can also check whether the paper is from well perceived journals or conferences, like IEEE, or ACM conferences and journals (see e.g. IEEE.org, and ACM.org). E.g., have a look at the following conferences:

DATE: Design Automation and Test in Europe: www.date-conference.com
CODES (Hardware-Software Codesign) + ISSS (International Symposium on System Synthesis): www.codes-isss.org
CASES: Compilers, Architectures, and Synthesis for Embedded Systems: www.casesconference.org
IEEE MICRO: Symposium on Micro Arch: www.microarch.org
HPCA: High-Performance Computer Architecture: www.hpcaconf.org
PACT: parallel architectures and compilation techniques: www.eecg.toronto.edu/pact

A larger list can be found here.
The paper should be published in 2013 or later (try to choose a very recent papers).
You should make a powerpoint presentation on your topic; max 5 min. per presentation (e.g. 5 slides; one slide introducing the problem, then the approach and results of each paper, and final conclusion and suggestions from your side on this topic; add / use clear pictures to explain the approach)
The presentation should contain at least the following:

Summary of the paper contributions (including technical details)
Your evaluation of the paper and topic

strong points
weak points
applicability of proposed methodology / solution
indicate new / future directions of research

In order to evaluate the paper you may wish to read related material on the same topic.
Your presentation will be evaluated by us. This evaluation will be taking into account for the final grading.

Examination

The examination will be oral about the treated course theory (all treated slides + corresponding parts of the used book), the lab report(s), and studied article.
When: 4th week of January 2016, or first week of February. We will discuss the dates with you.
Grading depends on your results on theory, lab exercises and your presentation.