Tom Vander Aa

Efficient use of the Instruction Memory Hierarchy through advanced Program Transformations


 

Introduction and motivation

Low energy is one of the key design goals of the current embedded systems for multimedia applications. Typically the core of such systems are Very Long Instruction Word (VLIW) processors However, power analysis of such processors indicates that a significant amount of power is consumed in the on-chip instruction memory hierarchy. Loop buffering is an effective scheme to reduce energy consumption in the instruction memory hierarchy.

Our Contribution in Software Controlled Loop Buffers

The most energy efficient way to manage these loop buffers is through the compiler. The compiler should be responsible for mapping the appropriate parts of the application onto these L0 buffers.

We have extended the compiler to better support loop buffers in two ways:

Mapping Exploration

Since the loop buffer is software controlled, the compiler is responsible to use it effectively. We propose a mapping algorithm to find the optimal use of the loop buffer for a certain application  

Loop Transformations

We proposed loop transformations to increase the amount of loops that can be mapped onto the loop buffer. Examples of loop transformations are:

  1. loop peeling: A small number of iterations are removed from the beginning or the end of the loop. This way a conditional, testing on the loop counter can possibly be removed, resulting in a smaller loop body.

  2. factorization: Often the core of multimedia code is written as multiple similar phases. Each of these phases contains (almost) the same code. If you can share the common code in a function, you only have to load it once into the loop buffer.

  3. loop unrolling + software pipelining: Loop unrolling will increase the ILP but also the code size. By combining loop unrolling with software pipelining we are able to make a trade off.

  4. loop splitting: By splitting the body of a loop and creating two separate loops executed one after the other, we reduce the size requirements of the loop buffer.



Tom Vander Aa -- 2003-08-19