Predictable Mapping of Streaming Applications on Multiprocessors

The design of new consumer electronics devices is getting more and more complex as more functionality is integrated into these devices. To manage the design complexity, a predictable design flow is needed. The result should be a system that guarantees that an application can perform its own tasks within strict timing deadlines, independent of other applications running on the system. This requires that the timing behavior of the hardware, the software, as well as their interaction can be predicted.

A heterogeneous multi-processor System-on-Chip (MP-SoC) is often mentioned as the hardware platform to be used in modern electronics systems. An MP-SoC offers good potential for computational efficiency (operations per Joule) for many applications. Networks-on-Chip (NoCs) are proposed as interconnect in these systems. A NoC offers scalability and guarantees on the timing behavior when communicating data between various processing and storage elements. Combining this with a predictable resource arbitration strategy for the processors and storage elements gives a predictable platform. To obtain a fully predictable system, also the timing behavior and resource usage of an application mapped to such an MP-SoC platform needs to be analyzable and predictable. The Synchronous Dataflow (SDF) model of computation fits well with the characteristics of streaming applications, it can capture many mapping decisions, and it allows design-time analysis of timing and resource usage. Therefore, this thesis aims to achieve predictable, streaming application behavior on NoC-based MP-SoC platforms for applications modeled as SDF graphs (SDFGs).

The most important timing requirement for streaming applications is usually related to the throughput that should be achieved. A major aspect influencing the achievable throughput, and in fact also the computational efficiency, is the storage space allocated to the data streams being processed in the system. In an SDFG model, the edges in the graph typically correspond to data streams. The storage allocation problem for an SDFG is the problem of assigning a fixed storage size to its edges. This size must be chosen such that the throughput requirement of the system is met, while minimizing the required storage space. The first major contribution of this thesis is an exact technique to explore the throughput/storage-space trade-off space. Despite the theoretical complexity of the storage allocation problem, the technique performs well in practice. By considering the entire trade-off space, it is possible to cope with situations where the precise throughput constraint is not (yet) known or might dynamically change. In multi-processor mapping, for example, scheduling decisions can influence the achievable throughput. This introduces uncertainty on the relation between the throughput and the storage requirements of an SDFG in the early phases of the trajectory when these scheduling decisions are not yet made.

In any multi-processor mapping trajectory, an application must be bound to and scheduled onto the processors and storage elements in the MP-SoC. An important contribution of this thesis is such a technique to bind and schedule an SDFG onto an MP-SoC. Existing binding and scheduling techniques can only deal with single-rate execution dependencies between tasks. SDFGs can express multi-rate dependencies between tasks. Dependencies in an SDFG can be expressed in single-rate form, but then the problem size may increase exponentially making the binding and scheduling infeasible. The binding and scheduling technique presented in this thesis works directly on SDFGs, building on an efficient technique to calculate throughput of a bound and scheduled SDFG.

When the application tasks have been bound to and scheduled onto the platform components, it remains to schedule the communication onto the MP-SoC interconnect. This thesis presents three different scheduling strategies that schedule time-constrained communications on a NoC while minimizing resource usage by exploiting all scheduling freedom offered by NoCs. It is shown that these strategies outperform existing NoC scheduling techniques. Furthermore, a technique is presented to extract the timing constraints on the communication from a bound and scheduled SDFG, connecting the NoC scheduling strategies to the binding and scheduling strategy for SDFGs mentioned earlier.

Finally, the techniques presented in this thesis are embedded into a coherent design flow. The starting point is a streaming application that is modeled with an SDFG and a NoC-based MP-SoC that offers a predictable timing behavior. The objective of the flow is to minimize the resource usage while offering guarantees on the timing behavior, in practice the throughput, of the application when mapped to the system. A case study is performed that maps a set of multimedia applications (H.263 encoder/decoder and an MP3 decoder) onto a NoC-based MP-SoC. It shows that the design flow, SDFG mapping techniques, and SDFG analysis techniques presented in this thesis enable a mapping of a streaming application onto a NoC-based architecture that has a predictable timing behavior. This makes it the first complete design flow that maps a time-constrained SDFG to a NoC-based MP-SoC while providing throughput guarantees.