Intelligent Architectures 5LIL0

2021 - 2022 (2nd semester, Q3)

Code : 5LIL0
Credits: 5 ECTS
Lecturer : Prof. dr. Henk Corporaal + several Guest lecturers (see the actual schedule)
Tel. : +31-40-247 5195 (secr.) 5462 (office)
Email: H.Corporaal at tue.nl
Project assistance: Berk Ulker (b.ulker at tue.nl), Kanishkan Vadivel (k.vadivel at tue.nl),
Martin Roa Villescas (m.roa.villescas at tue.nl), Wei Sun (w.sun at tue.nl),
Ali Banagozar (a.banagozar at tue.nl), and Floran de Putter (f.a.m.d.putter at tue.nl),
Material: check canvas and below.
Previous year (2021): check here

News

Oral exam will be on Wednesday, April 20 / Thursday, April 21, 2022.

Guidelines are on canvas.
All slides are now online.
Instructions on Bonus option will follow.

Course starts online at February 8, 2022, at 10.45 CET
Internal, TUE students only: videos of lectures will be put online.
Schedule (will be updated if needed)
Lectures are on Tuesdays, 3,4th hour and Fridays 7,8th hours

Description

Machine learning and in particular deep learning has dramatically improved the state-of-the-art in object detection, speech recognition, robotics, and many other domains. Whether it is superhuman performance in object recognition or beating human players in Go, the astonishing success of deep learning is achieved by deep neural networks trained with huge amounts of training examples and massive computing resources. Although already applied successfully in academic use-cases and several consumer products (e.g. machine translation), these data and computing requirements pose challenges for further market penetration.

This course on Intelligent Architectures first treats the most important Deep Learning Networks. In particular we treat how they operate, their implementation, and how they perform learning. We will use standard frameworks, like Tensorflow or PyTorch, for building these networks.

These networks require lots of computation and memory accesses, making them costly and very energy consuming. Therefore this Intelligent Architectures course gives an in-depth treatment of several Network and Implementation optimizations steps, like network pruning, quantization and loop-nest transformations for a drastic reducing of computation and memory traffic requirements.

We also treat various processing and accelerator platforms tuned for deep learning algorithms, including (embedded) GPU, Tensor Processing Unit, and TTA (Transport Triggered Architectures) tuned for Deep ANNs (Artificial Neural Networks). Tuning the architecture and/or adding specific hardware can lead to huge cost savings.

Finally we will look in the future, and hint on what other high potential machine learning approaches can offer, like Bayesian learning and Neuromorphic computing.

The course includes 3 lab assignments, covering above topics. The labs give you real hand-on experience on designing and implementing ANNs.

You will learn:

- understanding deep learning, including network architectures, inference, and learning methods.

- how to design Deep Artificial Neural Networks (ANNs).

- how to implement and optimize ANNs using various optimization methods.

- state-of-the-art ANNs, including the newest type of operators.

- special processign architectures and hardware efficiently supporting Deep Learning.

- alternative approaches to the ''classical'' ANNs, like Bayesian learning and Neuromorphic (SNN based) computing.

Topics:

The main emphasis is on Deep Learning, in particular on ANNs (Artificial Neural Networks), its algorithms, and its Efficient Implementation, using custom and off-the-shelve processors and accelerators. Note often we talk about DNNs (Deep Neural Networks); usually we refer then to Deep ANNs, and not Deep SNNs (Spiking Neural Networks).
In this course we treat among others the following topics:

CNN: Convolutional Neural Networks
Learning principles
Frameworks for designing DNNs (Deep Neural Networks)
Optimizations

Compact DNNs
Quantization of activations and weights in DNNs
Advanced mapping of DNNs exploiting data reuse for activations and weights

General architecture support for DNNs
DNN accelerators
Beyond the classic neural networks

Neuromorphic computing
Bayesian computing

Most of the topics will be supplemented by very elaborate hands-on exercises.
For a preliminary lecture overview see: schedule.

Handouts

The lecture slides will be made available during the course; see also below.
Mandatory reading material:

Efficient Processing of Deep Neural Networks: A Tutorial and Survey
by Vivienne Sze, e.a.
arXiv, August 2017

Corresponding slides, from ISCA 2019

Suggested background material:
Check YouTube presentation: Design for Highly Flexible and Energy-Efficient Deep Neural Network Accelerators [Yu-Hsin Chen]

Check the Stanford courses cs230 and cs231, e.g.

http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture05.pdf
https://cs230.stanford.edu/syllabus/

The Eyeriss project of MIT
Tutorials form the Eyeriss (MIT) group, in particular the slides form their ISCA 2019 tutorial:

Overview of Deep Neural Networks [ slides ]
Popular ANNs and Datasets [ slides ]
Benchmarking Metrics [ slides ]
DNN Kernel Computation [ slides ]
DNN Accelerators (part 1) [ slides ]
DNN Accelerators (part 2) [ slides ]
DNN Model and Hardware Co-Design (precision) [ slides ]
DNN Processing Near/In Memory [ slides ]
DNN Model and Hardware Co-Design (sparsity) [ slides ]
Sparse DNN Accelerators [ slides ]
Tutorial Summary [ slides ]

UvA course on Deep Learning, by Efstratios Gavves
UvA course on Machine Learning, by Cees Snoek e.a.

Slides (per topic; will be made available during the lecture)

Overview and guidelines
Topic 1: Convolutional Neural Networks (CNNs), Inference
Topic 2: Learning Principle

Berk Ulker (TU/e)

Topic 3: Frameworks / Software Tools & Design of CNNs

Berk Ulker (TU/e)

Topic 4: Optimization 1, Design of efficient / small networks

Guest lecture by Maurice Peemen (Thermo Fisher):

Part I: Network Pruning
Part II: Efficient design of networks & filter decompostion

Topic 5: Optimization 2, Quantization

Floran de Putter (TU/e)

Topic 6: GPUs, Graphic Processing Units and CUDA programming model
Topic 7: Optimization 3, Exploiting Data Reuse by Loop Transformations and local Buffering
Topic 8: Further efficiency improvements

Topic 8a: Extreme quantization, binary and ternary architectures and deep learning networks

Floran de Putter (TU/e)

Topic 8b: Multiplier less Deep Learning

Guest lecture by Sebastian Vogel (NXP)

Topic 9: DL Applications and Advanced DNNs

Alexios Balatsoukas-Stimming

Topic 10: Accelerators for Deep Learning

Topic 10a: NeuronFlow: An Architecture for Edge AI

Guest lecture by Orlando Moreira (GrAI Matter Labs)

Topic 11: General Architecture Processor support for Deep Learning Neural Networks

Kanishkan Vadivel

Topic 12 DSE of Networks and Mappings

Topic 12a: Neural Architecture Search

Guest lecture by Willem Sanberg (NXP)

Topic 12b: DSE with Timeloop and ZigZag

Floran de Putter (TU/e)

Topic 13 Beyond ANNs: Neuromorphic Computing and Engineering

Topic 13a Spiking Neural Networks (SNNs)

Frederico Corradi (TU/e, IMEC)

Topic 13b SNN Learning

Sherif Eissa (TU/e)

Topic 14 Beyond ANNs: Bayesian inference

Martin Roa Vilescas (TU/e)

Topic 15 Neuromorphic architectures for edge AI

Frederico Corradi (TU/e, IMEC)

Topic 16 Recap

Student presentations guidelines

As part of this lecture you have a bonus option: you have to study a hot topic related to this course, and make a short slide presentation about this topic. Details will be announced during the lecture.

Guidelines are as follows:

Choose one hot topic which interests you and which is highly related to this course!
Select one technical (in depth) research paper from the web, based on this topic. See lists below.
The paper should be published in 2019 or later.
The paper should have sufficient technical depth; i.e. it should clearly explain all the details of the proposed method or solution. So e.g. do not choose company white or business papers. You can also check whether the paper is from well perceived journals or conferences, like IEEE, or ACM conferences and journals (see e.g. IEEE.org, and ACM.org).
Check the top conferences and top journals on Machine Learning and Artificial Intelligence.
You may also have a look at the following two lists:
Top architecture conferences, containing lots of Deep Learning Architecture and Implementation papers:

ISCA: International Symposium on Computer Architecture: iscaconf.org
IEEE MICRO: Symposium on Microprocessor Architectures: www.microarch.org
ASPLOS: Architectural support for languages and operating systems: asplos-conferenc.org
ICS: International Conference on Supercomputing: www.ics-conference.org
ISSCC: International Solid State Circuits Conference: isscc.org
DAC: Design Automation Conference: www.dac.com
DATE: Design Automation and Test in Europe: www.date-conference.com
CODES (Hardware-Software Codesign) + ISSS (International Symposium on System Synthesis): www.codes-isss.org
CASES: Compilers, Architectures, and Synthesis for Embedded Systems: www.casesconference.org
IEEE MICRO: Symposium on Micro Arch: www.microarch.org
HPCA: High-Performance Computer Architecture: www.hpcaconf.org
PACT: parallel architectures and compilation techniques: www.eecg.toronto.edu/pact

Top conferences on Machine Learning & Artificial Intelligence, containing also Deep Learning Architecture and Implementation papers:

NeurIPS: Neural Information Processing Systems (NIPS)

ICML: International conference on Machine Learning
CVPR : IEEE/CVF Conference on Computer Vision and Pattern Recognition
ICCV : IEEE/CVF International Conference on Computer Vision
ECCV : European Conference on Computer Vision ·
AAAI : AAAI Conference on Artificial Intelligence

A larger list on computer architecture related conferences and journals can be found here.

PRESENTATION:
You should make a powerpoint presentation on your topic; max 5 min. per presentation (e.g. 5 slides; one slide introducing the problem, then the approach and results of each paper, and final conclusion and suggestions from your side on this topic; add / use clear pictures to explain the approach)
The presentation should contain at least the following:

Summary of the paper contributions (including technical details)
Your evaluation of the paper and topic

strong points
weak points
applicability of proposed methodology / solution
indicate new / future directions of research

In order to evaluate the paper you may wish to read related material on the same topic.
Your presentation will be evaluated by us. This evaluation will be taken into account for the final grading.

Hands-on lab work (will be updated)

Becoming an expert in Deep Learning and Deep Neural Networks requires that you get your hands dirty and make practical assignments. Therefore, as part of this course we have put a lot of effort to prepare 3 very interesting lab assignments. Details will be presented during the course, and material will be placed on the oncourse 5LIL0 site.
** labs will be put online during the course **

Hands-on 1: DNN design

You will design a deep Convolutional Neural Network (CNN) using one of the well-known frameworks: PyTorch. The network has to recognize spoken words.
After learning the network will be tuned: pruning, quantization and other optimizations.

Hands-on 2: DNN implementation on GPUs

Graphic processing units (GPUs) can contain upto thousands of Processing Engines (PEs). They achieve performance levels of Tera FLops (10^12 floating point operations per second). In the past GPUs were very dedicated, not general programmable, and could only be used to speedup graphics processing. Today, they become more-and-more general purpose. Lately they also support Deep Neural Networks (DNNs) by supporting smaller data sizes (e.g. Float 16-bit) and having special units speeding up learning and inference.
In this lab you are asked to map a DNN efficiently on a GPU, using all the tricks you can play.

Hands-on 3: DNN implementation on Embedded ASIP

In this lab we will map a Deep Neural Network (DNN) to an Application Specific Instruction-set Processor (ASIP).
We will use the AivoTTA from Tampere University as a target platform. You can tune the platform by adding specific function units.
See the lab3 assignment.
Further files an details are on oncourse.tue.nl

Examination

The examination will be oral about the treated course theory, the lab report(s), and studied articles.
We will discuss the dates with you.
Grading depends on your results on theory, lab exercises and defense, and your presentation.