2. Compile and run
3. Change the input image
4. Simulation results
The JPEG decoder takes a JPEG image and converts it to a (uncompressed) bitmap using a decoding method called the baseline decoding process. A short introduction to the JPEG decoding process can be found in chapter 2 and appendix of the report "Design and implementation of a JPEG decoder" by Sander Stuijk. The multi-processor decoder is based on a single-processor decoder that is in ./c_prog/djpeg_orig. Figure 1 shows the process steps (blue) of the JPEG decoding process:
The yellow blocks describe the data that is sent between nodes (mMIPS processors). See the next paragraph for more information.
The JPEG decoder process has been divided up into three nodes as depicted in figure 1. Each step runs on a separate node of the mMIPS network: node1 is at (X,Y) = (0,0), node2 is at (1,0), node3 is at (0,1) and the node (1,1) remains unused. This partition was chosen because a quick investigation of the sources of the original JPEG decoder revealed that partitioning just before and after IDCT-function was the easiest to realize. This choice also has the advantage that the Huffman decoding and dequantization tables required by the VLD and DQ units respectively do not need to be sent over the network. This partition leads to the distribution of the workload as shown in table 1.
Table 1: Workload for nodes 1, 2 and 3 in the JPEG decoder.
|Node||Steps + workload %||Total workload %|
|1||VLD (35%) + ZZ (5%) + DQ (10%)||50%|
|3||Color conversion (15%) + Reorder (15%)||30%|
Table 2 describes the files and folders in ./c_prog/djpeg_mmips.
Table 2: Overview of the files and folders in ./c_prog/djpeg_mmips.
|File||Contents / purpose|
|/dumps||Contains the files mips_ram.xXyY.dump, which give the data memory contents after a simulation has ended for nodes (X,Y). These files are moved to the folder automatically if the Linux shell script dolcc is used.|
|/example||This folder contains the simulation results of the JPEG decoder for an example image.|
|/node.xXyY||If the Linux shell script dogcc is used, then a node generates files named output_to_0xADDR.bin. The file contains the data the node tried to send to the node with relative address ADDR. See the next chapter for more information on the dogcc script or check out the page on the C Communications library for more information on the output-to-file capability.|
|/test_images||Contains JPEG images that can be used to test the decoder.|
|color.*||Contains color_conversion() that converts YUV colors to their RGB equivalents.|
|Project files that describe the JPEG decoder in the Linux programming environment kdevelop.|
|dogcc||Use the script dogcc to compile and run the JPEG decoder from the Linux shell using gcc. It uses Makefile to compile and renames the output files in such a way that the communications pattern depicted in figure 1 is achieved (see paragraph 1.4 of the C communications library for more information on how communication simulation via files works). The script is for the 2-by-2 mMIPS NOC.|
|dolcc||Use the script dolcc X to compile the JPEG decoder using lcc and run a simulation for X minutes. The script is for the 2-by-2 mMIPS NOC.|
|dump||dump is used in the scripts dolcc to create and initialize the node data memories.|
|fast_int_idct.*||Inverse Discrete Cosine Transform (IDCT) using Integers.|
Performs the Variable length decoding (VLD), Zigzag scan (ZZ) and Dequantization (DQ), see also figure 1.
|jpeg.h||Constants, preprocessor symbols and data structures common to all nodes.|
|Makefile||Used by the script ./dogcc to compile the nodes using gcc.|
|mips||Ready to use hardware simulator for a 2x2 mMips NOC|
|mips_ram.empty.bin||Empty data memory which is used by the script dolcc to initialize the data memories.|
|parse.*||Contains mgetc() which allows functions within step1.c access to the input image. Also contains some other functions that involve retrieving specific data from the image.|
|stepX.*||Contains the function main() for node X. See figure 1 for a description of the JPEG decoding steps performed by each node.|
|strings.sh||Dumps the output of mprintf() for each node. Since the mMIPS NOC does not have a terminal, the printf() clone mprintf() is used to output to memory.|
|sunraster.*||Extracts a viewable bitmap image from the memory dump of the the last node / step in the JPEG decoding process.|
|tree_vld.*||Creates the Huffman table.|
The sources of the multi-processor JPEG decoder are located in ./c_prog/djpeg_mmips. The compilation steps needed to compile the JPEG decoder are comparable to those for gossip. The compilation process and run types for gossip are discussed in the application design flow. You need to make few modifications to the C sources and shell scripts if you want to change the input image of the JPEG decoder or change the memory layout of the mMIPSes.
A change of input image involves the following changes to the source code and scripts:
The subfolders dogcc and dolcc in the folder ./c_prog/djpeg_mmips/example contain the output of a run of dolcc and dogcc respectively for a 32x24 pixels color JPEG image of a surfer (see figure 2).
Any output written to stdout or stderr during the execution of these scripts was saved in the file output_surfer.txt. For dolcc the dumps subfolder contains the contents of the data memories of the mMIPSes after completion. The output bitmap is in the data memory file mips_ram.x0y1.dump beginning at address 0x0 by default. khexedit (an X-Server like Hummingbird Exceed is needed) can be used to verify that it is exactly the same as the file output.ras that dogcc generates (see next). The syntax of the HEX-editor is khexedit <filename>. For dogcc the node.xXyY subfolders contain any data that was sent and output.ras is the resulting bitmap (in Sun raster format). You can view the output file with eog (Eye of Gnome) if you have an X-Server like Hummingbird Exceed using the command: eog output.ras &
The decoding of surfer.jpg on a 2x2 mMIPS NOC using the hardware simulator took 29 hours on a Pentium III 1GHz processor running GNU/Linux 2.4.20 with 2048 MB of RAM. This process the took 604 milliseconds on the actual FPGA implementation. This simulation speed may be too low for some situations. The main two reasons for the low simulation speed are the cycle-accurate RTL level simulation and the lack of a multiply instruction. Fortunately, there are a number of things we can do to improve the performance of the simulator.
A faster simulator is one of the goals for the future.