Design Space Exploration (DSE) Assignment
After playing with the SimpleScalar and Wattch tools, here goes the real assignment. The task is to find
out the most performance-energy efficient superscalar architectures for the compress benchmark.
Wattch (sim-outorder) is used in this assignment, as it can provide both performance and power results. However, other SimpleScalar tools,
like sim-profile, sim-cache, sim-bpred, ... will be very helpful for you to efficiently find which parameter(s) to adjust and how.
The goal of this assignment is to gain a deeper understanding of the superscalar architecture.
The requirement of this assignment is: finding out the configurations, which lead to the "optimum" Energy-Delay Product,
by tuning architecture parameters.
You should justify the approach how you perform the DSE, evaluate and explain the intermediate and final results you get.
The metric we will use in this assignment is:CPI2 * Energy_per_Cycle
The corresponding values from Wattch simulator are:
1) CPI: sim_CPI ;
and 2) Energy_per_Cycle: avg_total_power_cycle_cc3.
A sample report (courtesy Sebastian Moreno Londono) is provided, which
gives you some initial ideas of how this assignment could be done and how your final report could look like. You need to find your own approach based on a clear
understanding of the superscalar architecture.
Your comprehension of the superscalar architecture and simulator, and your interpretation
of your findings at each step are far more important than the final single "best" result you provide in the report! So spend more time on understanding
the superscalar architecture and simulator structure, and justifying the DSE approaches you applied. Runing a script with brute-force method for hours on a large design
space won't help you to get a decent grade unless you have a very good reason.
You may expect questions like "how do you do it",
"why do you do it in this way", and "what does the figure/table/result mean" during the oral exam phase.
The final report should be written in English, and defended during your oral exam.
In the final report, we'll be glad if the "performance-energy" graph is also presented. Every point in this performance-energy graph stands for one specific
configuration. You only need to put there the interesting points/configurations (also called pareto point) you find.
- System Default Configurations
Using " -dumpconfig " flag to dump the default architecture configuration to the file you specified
Try to understand the meaning of each parameter by refering to the SimpleScalar documents, and the help file ("-h ") as well
The parameters we are interested in, together with their default values, are listed in the Parameters_File
(In this assignment, only these parameters are allowed to be tuned). The bottom line is that you must set a PRACTICAL configuration, e.g., you
cannot set the branch prediction to "Perpect" as this is the ideal case. Likewise, you can not modify the cache access/miss latency and set it to "0",
as it makes no sense either
The "-config" flag is a very useful flag for simulating your own configuration. You can first dump the default configuration file, tune the parameters, and save it as a new file (say my.cfg), then run the
simulation by simply calling "./sim-outorder -config my.cfg ..." instead of explicitly configurating them on the command line one by one
compress Benchmark from SPEC95
The benchmark we use in this assignment is compress, which is from the standard benchmark set SPEC95. The pre-compiled binary (compress.ss) and its input file (test.in), as well as a reference output file (test.out), are
all located at the benchmark folder in your home directory.
The sim-outorder simulator of Wattch is used in this assignment to get both the CPI (sim_CPI )
and Power (avg_total_power_cycle_cc3) . To run a simulation on the default settings, you can type
"$HOME/SimpleScalar/sim-wattch-1.02d/sim-outorder -redir:sim sim.log -redir:prog prog.log compress.ss < test.in". In
order to separate the simulator output and program output, here we redirect the simulatrion output to sim.log file, and the compress result
to prog.log. For each simulation, it takes roughly 10-20 seconds (depends on how heavy the server is loaded).
A script will ease the pain of performing DSE on a large design space. A simple sample script will be put here.
You can also use a script to pick up the interesting data from the multiple output files, which also makes the analysis easier.
However, it is not wise to use a script to include all the parameters and carry out a brute-force search. It could run for days and have no guarantee
to provide a meaningful result. Moreover, by doing a brute-force search, it is difficult for you to gain enough insight into the superscalar architecture,
thus, fail to write a good report. So, you are suggested to tune one or a group of related parameters at a time, and make a next step based on the previous simulation results/trend.
A Few Guidelines
During doing the DSE, you may probably get lost as it is a very large space to explore. There are many papers talking about how to apply a DSE on a large space, you may bring/combine their ideas into
this assignment. A relatively simple method is to categorize the paramters into several close-related sub-groups, and doing DSE group by group. Concretely, you may
Profile the benchmark, and identify its characteristics. Set an initial configuration based on these characteristics.
Decide which parameters can/have to be grouped together.
Decide which sub-group to start with. And decide the parameter settings of other sub-groups when you are analyzing the current one.
When doing DES in each sub-group, refer to the published papers related to this group. They may provide you many useful information, like possible choices,
practical constraints, etc., to reduce the design space (For example, I put some Yale Patt's papers on the Parameters_File for
Using bruteforce search with constraints on relatively small space. Guide your next simulation with the "trend" you find in the previous run.
You may need to break the border of sub-groups, and simulate paramters in different groups together for further improvement.
These are just some rough ideas, please use the methods which you think/find is the best. I am glad to be surprised.