Supported instruction set
Supported data types and soft ops
Changing the memory layout
The mMIPS (pronounced as "mini MIPS") is a simplified version of the MIPS processor. Compared to the MIPS it has a reduced instruction set which means that some operations need to be done in software. The mMIPS that comes with the package has been extended with a network interface. On this website the term mMIPS refers to the mMIPS with this network interface. This page also contains information about an instruction and data cache for the mMIPS. We do not use that functionality. The SystemC sources of the mMIPS are in ./noc/mmips. It is possible to change the sizes of the code or data memories as well as the number of mMIPS processors on the NOC by changing the dimensions of the network.
These instructions are supported by the mMIPS and the compiler lcc that we use in this project:
More information is available in the text file README.mMIPS in the ./lcc directory.
Below is a summary of the sizes in bytes of the standard data types in lcc and the operations that are performed in software (called soft ops) due to the lack of a complete instruction set.
The mMIPS implementation that comes with this package assumes two separate memories of 16 kilobytes, one for instructions/code and one for data structures. It may be desirable to change the size of these memories, e.g. to fit more processors on the FPGA. The memory sizes have however been hard coded in the SystemC sources of the mMIPS, in the lcc compiler and in the applications that run on them. Changing the memory sizes in one place means changes in the other locations as well. This paragraph explains what changes need to be made and how to make them.
Note The address of the first byte in the data memory is 0x0 in C and thus the last byte is at address 0x3FFF. It is therefore not possible to (accidentally) address bytes in the instruction memory from C.
|1.||C Debugging library|
|Change the output destination memory range of mprintf() in the C debugging library and recompile it. This memory range reserved for this purpose contains the bytes at address MPRINTF_START_ADDR up to and including MPRINTF_MAX_ADDR (these preprocessor symbols are defined in mtools.h). The data range reserved for mprintf() is 0xE00 through 0xFFF (= 512 bytes) in the version of mtools.h that comes with the package. The bytes at address 0x0 through 0xDFF have been reserved for the input / output image of the JPEG decoder. This means that the bytes up to and including 0xFFF will not be used by lcc. If MPRINTF_MAX_ADDR increases, then the C compiler lcc needs to be reconfigured also to prevent it from storing data structures in the area used for the JPEG image or by mprintf().|
|2.||C compiler lcc|
|The following files determine the memory layout as the
compiler sees it:
A recompilation of the lcc compiler is necessary only if you change crt0.s.
|3.||mMIPS SystemC implementation|
|The following files in the System C implementation of the
mMIPS determine the memory layout:
|4.||Gossip and JPEG decoder|
|The script strings.sh writes that portion of the data segment to stdout that contains the output of mprintf(). If the output location of mprintf() was changed, then this script needs to be changed too. strings.sh is used by dolcc in ./c_libs/gossip and ./c_libs/djpeg_mmips. For the JPEG decoder it is also necessary to change the memory location where parse.c and step3.c load and store the picture respectively. The first file expects the picture at the start of the data memory (address 0x0) and the last character at MAX_ADDR_IMAGE (address 0xDFF initially). In step3.c sunraster_header *FrameHeader is initialized to the starting address of the picture (0x0) and unsigned char *FrameBuffer to the first byte after the header of the image (0x20). See also: choosing another the input image.|
Since the network interface (N.I.) is memory mapped it can be accessed through specific memory locations. The network interface is controlled in mMIPS assembler by appropriate stores/loads to/from these memory locations. A new module, MEMDEV, replaces the data memory of the original mMIPS and, based on the requested memory address either read/writes the RAM memory (for regular addresses) or performs appropriate communications with network interface (for device addresses). Note that the C communications library stdcomm handles all the internals of the N.I. This setup is the depicted in the following figure.
Figure 1: The MEMDEV module accesses RAM or NETWORK_INTERFACE depending on the address.
The MEMDEV module recognizes two addresses assigned to the N.I.:
0x80000000 and 0x80000004 (note how the device access is indicated by the most significant bit of the
address). The first address (data word address) is associated with N.I. data while the second word (control word address) is used for N.I. control.
Reading and writing from the data word results in the reading/writing
internal buffers of the N.I. (these are physically separate buffers,
so writing to the data word and reading it back will not return
the same value). The read/write operation to the data word are
always non-blocking, i.e. regardless of the state of the N.I. they
read/write the N.I. buffers. However, depending on the state of
N.I. the read data may be invalid, or the written data can overwrite
the packet in the send buffer, which was not sent yet.
To monitor the status of the N.I., the control word of the N.I. can
be accessed by the memory address 0x80000004. The meaning of the bits in the
word is explained in figure 2.
Figure 2: Meaning of the bits in the N.I. control word (device address: 0x80000004).
Note that some of the bits in the control word can only be written (0-15, 17 and 20 - they control the behavior of N.I.) while some can only be read (16, 18 and 19 - they report the status of N.I.).
The status bits include:
The control bits:
The network interface is a module used to send and receive packet on the network. It is capable of sending and receiving packets with lengths being an arbitrary multiple of 32 bits. When such a packet is sent over the network is is split in smaller parts called flits before it is sent. Conversely, the arriving packet is reconstructed by collecting three flits. For any additional 32-bit word within the packet, additional two data-flits need to be added to the packet. The two actions of sending and receiving the packet are performed by two independent processes within the network interface (i.e. N.I. is able to receive and send simultaneously). The interface of the module is shown in the figure 3.
Figure 3: The network interface module.
On the network side, the interface is compatible with the network's router: it has two sets of data/req/ack signals, one in each direction.
On the processor side, NI provides a set of signals necessary to write destination address and the packet word data (reg_data_in with write_addr and write_data), read received packet word (reg_data_out), trigger packet sending (send) and confirm packet's reading (read) and the signals reporting communication status (send_rdy and data_rdy). In addition to that, two signals are used to mark last words of a given packet: packet_end asserted together with send means that a last word of the packet is being sent, while rcv_packet_end active together with data_rdy means that the last word of the packet has arrived.
The interface is fully synchronous, i.e. all data and associated strobes are probed on rising clock edge.
The sending of a packet is performed in the following manner:
To read a received packet:
The functions sc_send() and sc_receive() in the C communications library stdcomm encapsulate the internals of communication..
The data and instruction memories within MIPS were replaced with two identical cache memories. The block diagram of the cache memory is presented in figure 4.
Figure 4: Block diagram of the cache memory.
The implemented cache memory is a direct mapped, write-through memory
with no-allocate-on-miss write policy. It has 256 blocks (8 bit
block index), with four words per block (two bit word offset). The
breakdown of the address is presented in figure 5.
Figure 5: Block diagram of the cache memory.
For implementation reasons, only 15 out of the 20 bits of block tag can be stored. This limits the addressable memory space to 27 bits.
At the heart of the cache memory is the memory module that stores the cache entries. Based on the selected block index and word offset, the memory outputs the appropriate word, together with the valid bit and the tag of the selected block. When writing, the word present at data input is written together with the tag and valid bit to the memory. The output of the memory is connected to byte_select module. In byte read mode, it replaces the most significant byte of the data word with the requested byte.
The inputs of the memory are connected to din_select and addr_split modules that generate appropriate memory control signals based on the input data and cache control signals.
The inputs to the cache are registered in the input register modules. These module register the values present at their inputs (unless the disable signal is asserted) and have two outputs: dout reflects the current contents of the register, while dout_select, depending on the select input, outputs the registered or the current input data.
The control signals of the cache are generated by the miss_ctrl module.
The interface of the cache has two sets of signals: the cache access signals are consistent with the interface of the regular memory and therefore allow seamless replacement of a memory with the cache. The second set of signals allows communication with the external memory or other module responsible for fetching data from the cache.
The cache can perform three different types of operations:
After memory is ready, miss_ctrl checks if the entry is valid and if the tag of the selected block is consistent with the requested tag. If the above conditions are not met, the miss condition occurs and new block contents (four words) needs to be fetched. Miss control signals miss condition by asserting miss_wait (which should freeze the processor) and initiates the fetch from main memory. The fetched words arrive one by one and are written to the memory by asserting we signal, and instructing din_select to use fetched data (din_fetch) instead of the regular input data, and instructing addr_split to output appropriate word offset (fetch_word). After the block has been fetched, miss_ctrl rereads the word that caused the miss and de-asserts the miss_wait signal. If byte-read was requested, byte_select performs reordering of word bytes and generates the requested data output.
Since the cache input signals are not registered (they are sourced from before pipeline register) if miss occurs, the current inputs need to be stored and the following read/write (after miss has been handled) uses the registered outputs of the input registers rather than the direct one (miss control asserts select signal).
NOTE: to enable read-before-write option in the synthesized hardware, the RAM modules instantiated in the Verilog source need to be annotated with appropriate attributes. This is illustrated in the following Verilog code, which should be inserted in CACHE_MEMORY.v file, which contains description of the memory module:
RAMB16_S36_S36 bram1(.DOA(DOA1), ... ); /* synopsys attribute WRITE_MODE_A "READ_FIRST" WRITE_MODE_B "READ_FIRST" */ RAMB16_S36_S36 bram0(.DOA(DOA0), ... ); /* synopsys attribute WRITE_MODE_A "READ_FIRST" WRITE_MODE_B "READ_FIRST" */
For more details on using Xilinx BlockRAM see this Application Note.
As mentioned before, the write-through transfer of the written word to the main memory is initiated together with the writing of the word. However, if the transfer cannot be performed (e.g. a network interface communicating with the main memory through network has full buffer), the cache stalls until the transfer can be reinitiated.
As was the case for word write, main memory transfer is initiated for the byte, which may also result in stall, if the main memory interface is not ready.
The memory module that implements cache memory is built out of two physical BlockRAM modules available in Xilinx FPGA. The setup is the blocks is presented in figure 6.
Figure 6: The memory module consisting of two physical BlockRAM modules.
Each of the blocks is configured in 512x(32+4) mode, i.e. it contains 512 32-bit word plus additional 4 parity bits per word. Four words of a cache block with index X are stored at addresses X and X+1 in the first block and X and X+1 in the second block. Since the memories are dual-port, it is possible to simultaneously access two words within the same memory block. Thus, simultaneous access to all four words within a cache block is possible.
To store the tag and the valid bit associated with each block, the additional parity bits are used (the parity is not hardware-assisted, so the bits can be freely used by the application for other purposes than parity). For each cache block there are 4x4 parity bits available (4 bits for addresses X and X+1 in each of the blocks), which gives 15 tag bits and 1 valid bit.
For more information about the BlockRAM memory see this Application Note.