NETWORK2x2 (torus network with E-cube routing)

Quick links: router, sub-router, output arbiter

The network that connects the mMIPS processors on the NOC is a torus network with E-cube routing. The network that comes with the package is two nodes wide and two nodes high. It is encapsulated by the SystemC module called NETWORK2x2 (see system architecture overview). It is possible to change the network dimensions, which has been successfully tested for a NETWORK3x2.

In the torus topology, a network node is connected to its immediate neighbors in both dimensions. At edges of the network, the connections wrap around and connect the last router in the given dimension with the first. For detailed discussion of the E-cube torus network and its properties see W.Dally "A VLSI Architecture for Concurrent Data Structures", Kluwer 1987. The sources of the torus network and its components are in ./noc/ecube.

In the e-cube routing each packet in the network is first routed along the X dimension, until it reaches a router with the X address equal to the packet's destination X address. Then, it starts to move in the Y dimension until it reaches the destination router. Since connections in the network are unidirectional, the packets can only travel in the direction of increasing addresses, if necessary wrapping at the edge of the network.

To implement deadlock-free communication, each physical link is shared by two virtual channels, numbered 0 and 1. Each packet sent in the network travels on the channel 0 until it reaches destination or wraps-around. Then, it moves to channel 1 and continues on this channel. This switch breaks the circular dependencies within the network and therefore prevents deadlocks.

The physical links are realized as 18-bit wide busses (16 data bits + 2 flit-type-bits). Each link is accompanied by 2 sets of request/acknowledge signals, each of which implements a single virtual channel (see figure 1).

The mMIPS is connected to a router by a bidirectional link realized as two unidirectional 18-bit wide data lines with associated req/ack lines.
 

Figure 1: Four routers (xYyY), four miniMipses (dp_xXyY) and the data lines that connect them.

The communication in the network is based on a synchronous request/acknowledge protocol. To send data, sender outputs the data and asserts the request signal. The data is stored by the receiver at the rising clock edge and acknowledged by the asserted acknowledge signal. Then, the sender withdraws the request signal and the receiver withdraws the acknowledge signal.

The router

The symmetry of both dimensions permits realization of a single router as a composition of two identical 1-dimensional sub-routers (see figure 2). The X sub-router receives data from the network interface of mMIPS on d input. This data is forwarded to the x output which is connected to the X sub-router of the neighboring router. The data travels through X sub-routers until it reaches the destination "column". Then, it is forwarded to the d output, which is connected to the d input of Y sub-router. In the process, the X address is stripped off the packet and replaced with the Y address. Further, the packet travels to the destination along the Y dimension. When it reaches the destination address, it is again forwarded to the d output, which in the case of Y sub-router is connected to the input of the data processor.

In the current implementation, relative addressing is used. A mMIPS provides initial packet with the address containing X and Y distance to the destination (including possible wrap-around). During the journey along one dimension, the first address component is decremented upon leaving a router. When it reaches 0, it is replaced by the address in the second dimension and forwarded to the d output of sub-router (which may be connected to Y router or mMIPS).

Figure 2: Router with the two sub-routers

As mentioned earlier, the links in the network are 16-bit wide (actually 18-bit, but the top 2 bits are used for control purposes, as discussed in the following). To transfer larger packets of data, packets need to be split into several 16-bit wide flits. The network allows arbitrary length packets. First flit of the packet is marked as header on the top 2 bits of the 18-bit wide data link. The 16-bit data of the header flit contains two 8-bit destination addresses (for X and Y dimensions). Upon receiving the header flit and based on the destination address, a router sets up the connection between the input link and an appropriate output link (x output on channel 0, x output on channel 1 or d output). An arbitrary number of the following flits is forwarded along the same route. The packet is closed by a flit marked as trailer. The data in the trailer flit is also forwarded along the route and the route is closed. This method of routing is called wormhole routing. Figure 3 illustrates this for a packet with 32-bits of data (0x12345678) sent from address (1,1) to (0,0) (relative address 1,1). This package is split in three flits: a header flit with the relative destination address, a data flit with 16 bits of data and a data flit marked as trailer with the remaining 16 bits of data.

 

Figure 3: A 32-bit packet sent from address (1,1) to (0,0) is split in three flits: relative address (0x0101) and data (0x1234 and 0x5678)

Sub-router

A block diagram of a single sub-router is presented in the figure 4.

Figure 4: Sub-router

The sub-router includes three identical input controllers: one for each virtual input channel to the router (note how x0inctrl and x1inctrl share single physical data input link, but have separate control inputs for separate virtual channels). Upon detecting active request line, input controller registers and examines the input data. If it's a packet header, input controller requests from the switch a route to an appropriate output queue (signals select and rqs). The requested output queue depends on the destination address of the packet.
The switch arbitrates the route requests and connects winner's data, req and ack signals to the corresponding signals of the requested output queue.
If the output queue (in current implementation - 1-element) is not full, it stores the data and issues ack signal to the input controller, which forwards it to the data source. The following flits delivered to the input controller are forwarded along the route established through the switch until the trailer flit is detected. After the trailer flit has been acknowledged by the output queue, input controller withdraws rqs signal, thereby freeing the route through the switch.

The output queue, upon receiving from the input controller and acknowledging a flit, forwards it to its output. Once the flit has been acknowledged, the queue clears buffer and is ready to accept new flits.
Note that while the d output queue is connected directly to the physical output channel, the two x output queues associated with virtual channels 0 and 1 need to compete for the access to the physical output link.

Output arbiter

The arbitration is performed by the OUTPUT_ARBITER module (figure 5). Whenever a virtual channel controller needs access to the link, it issues a request signal to the arbiter and waits for grant signal. The request signal 0 has a higher priority than 1, but after each transmission (1 clock cycle) the requesting channel controller has to remove the request signal, thereby allowing the other waiting channel to access the link. This way, the arbitration is biased towards channel 0, but channel 1 will not be starved.

 

Figure 5: OUTPUT_ARBITER