Exercise 3: modeling interconnect delay

The channels in an FSM-SADF graph are used to communicate tokens between actors. Whenever an actor produces a token on a channel, this token is available for immediate consumption by the other actor connected to this channel. Hence, sending data over a channel takes no time. In a real platform, there will obviously be a delay when data is produced by one actor and then sent to another actor that might be running on another processor. In this exercise, you will learn how to model the delay caused by interconnect communication into the FSM-SADF graph. Using this model, it becomes possible to analyze the throughput of an application while taking the interconnect delay into account.

In this exercise you will continue with the FSM-SADF graph that you have already analyzed in the previous exercise. Similar to the previous exercise, the channel from actor R to actor Q and the channel from actor Q to actor P model buffer sizes. The number of initial tokens on these channels indicates the buffer size assigned to these channels.

Open your file browser and go to the folder '<path where you unpacked the archive>/hands-on/exercise 3'.
Open the file 'example.xml' in a text-editor and study the contents of this file.
What is the throughput of this graph?
In the previous exercise, you already found that the throughput of this graph is equal to 0.25 iterations/time-unit. Obviously it did not change since we are using the same graph in this exercise.

The throughput which you measured to answer the above question can only be achieved if there is no delay in sending a token between two actors. As mentioned before, the interconnect in any real platform will incur a delay when sending data between two processors. Hence, the throughput as you just measured might not be achievable on a real platform. In fact, the throughput which you measured might be higher than the throughput which you can realize in a real platform. This is obviously undesirable if you want to build a system that provides timing guarantees. To solve this problem, the delay introduced by the interconnect should be taken into account when computing the throughput of the graph. This can be done by modeling this delay in the FSM-SADF graph. A first and simple means to take the interconnect delay into account would be to assume that the interconnect incurs a constant (worst-case) delay on each token sent across the interconnect. To simplify things even further, we could assume that the interconnect offers no pipelining opportunities. Hence, only a single token can be in transit at the same time. In this situation we can model the interconnect delay with a single actor as shown in the figure below.

Whenever the source and destination actors of a channel c are bound to different processors, we can add this model to the FSM-SADF graph. The input channel in our model is then connected to the source actor of channel c and the output channel in our model is connected to the destination actor of channel c.

Assume that actors Q and R are mapped to different processors. Extend the example graph such that the delay of sending tokens across the channel q2r is taken into account. Assume further that the interconnect has a delay of 5 time units (i.e., the actor execution time of actor D is equal to 5 time-units). (hint)

The resulting graph should look as follows:

What is the throughput of this graph?
The graph has a throughput of 0.1 iterations/time-unit when we take the interconnect delay of communicating tokens over channel q2r into account. Note that you computed earlier that the same graph would be able to realize a throughput of 0.25 iterations/time-unit. This result was however obtained while ignoring the interconnect delay. Your new result shows that the interconnect can have an impact on the achievable throughput of an FSM-SADF graph. Therefore it is important to model the interconnect delay into the FSM-SADF graph when analyzing its achievable throughput.

The actor D and the two channels connected to it model the delay of sending tokens through the interconnect from the processor on which actor Q is executed to the processor on which actor R is executed. The buffer size of this connection is modeled with the channel from actor R to actor Q. This implies that the buffer size of the connection is shared between the actors Q and R. Whenever R reads data from the connection, this space will immediately be available for use by actor Q. In practice, this will typically not be true. In many practical platforms, the producing and consuming actor (e.g., actors Q and R in our example) will have separate buffer spaces near their own local processors. In other words, the buffer modeled with the channel from actor R to actor Q will be broken into two separate buffers. This can be modeled by adding two additional channels to our delay actor D (see figure below). These two channels model respectively the buffer available for actor P and actor Q. Note that the direct channel from actor R to actor Q is no longer needed.

Adapt the XML description of our example FSM-SADF graph such that it models the two seperate buffers for actors Q and R. Both buffers should have a buffer size of two tokens. (Hence, your new XML description should match with the above figure.)
What is the throughput of this graph?
The graph has a throughput of 0.0833333 iterations/time-unit. The throughput decreased in this case because the actor R is sometimes forced to wait till a new token has been transfered from actor Q through the interconnect (actor D). As a result, some executions of actor R will be delayed in the split buffer situation compared to the situation in which the buffer is not split. This causes also a delay in the time at which actors P and Q can be executed. As a result, the throughput of the graph decreases when using a split buffer instead of a single unified buffer. As mentioned before, most practical platforms will use split buffers. The example shows that splitting buffers has an impact on the throughput. Therefore, the buffer splitting should be taken into account when modeling the interconnect delay.
Increase the buffer size on the side of actor R to 3 tokens.
What is the throughput of this graph?
The graph has a throughput of 0.1 iterations/time-unit. This result is not very surprising as we simply remove the bottleneck that limited actor R from firing multiple times without having to wait till a token has been sent over the interconnect. This solution obviously comes at the expense of using additional buffer space (i.e., increasing the memory requirement of the FSM-SADF graph when implemented on a real platform).

The single actor delay model that you have used so far to model the delay introduced by sending tokens over the interconnect is very simplistic. It assumes that only a single token can be in transit at the same time. Modern interconnects (e.g., networks-on-chip) allow pipelined communication in which multiple tokens can be in transit at the same time. These interconnects can be characterized with a bandwidth that indicates at which speed new tokens can be introduced into the connection and a latency that indicates the time it takes to transfer a single token between the two end points of the interconnect. Essentially, the network-on-chip and many other modern interconnects have a latency-rate behavior. Such a behavior can be modeled using the following two actor model.

In this model, it takes 5 time-units to send a single token across the interconnect. This is similar to the single actor delay model you used before. However, this new model allows you to send a new token every time-unit. Hence, up-to five tokens can be in transit at the same time.

Replace the single delay actor D with the above model. Note that you now have to think carefully how to reconnect the two channels modeling the buffer space of the channel from actor Q to actor R. As you will notice when solving this problem, the two actor model shown above does not capture the buffer capacity offered by the interconnect. You need to add one additional channel to the model shown above to model the fact that a real interconnect can only hold a limited number of tokens at the same time. (hint)
What is the throughput of this graph?

The graph has a throughput of 0.166667 iterations/time-unit. This throughput is the maximal throughput that can be guaranteed when the application is running on a platform with an interconnect that has a latency of 4 time-units and a bandwidth of 1 token per time-unit

Note that the single delay model which you analyzed before provided a throughput guarantee of 0.1 iterations/time-unit when using the same actor to processor binding, but a much simpler interconnect model. When our real platform could be modeled with the latency-rate model, but we would instead use the simpler single delay actor model, then we would underestimate the throughput that we can achieve. Depending on the throughput constraint, this could lead to an over-allocation of resources. Obviously this is highly undesirable. Therefore, it is always important to use accurate and tight models of the architecture components that you include in the FSM-SADF graph. These models should be conservative, but not too conservative. In many practical situations, this means that tools like SDF3 use model for architecture components that are far more complex (i.e., contain many more actors and channels) then the models used in this exercise. The main idea to capture the timing impact of the architecture in the FSM-SADF graph of the application is however the same.

Previous exercise | Next exercise