Architectures and Code Generation
Literature (partly online) and Suggested topics and
papers
for the presentations
Last updated 1998
See also recent conferences like:
IEEE MICRO (mainly on ILP architectures and Code generation)
APLOS (Architecture support for programming languages and operating
systems)
ISCA (International symposium on computer architecture)
ICS (International conference on supercomputing)
1. Branch Prediction
-
A
Comparison
of Dynamic Branch Predictors that use Two Levels of Branch History.
Yeh, T. and Patt, Y.
Proceedings of the 20th Annual International Symposium on Computer
Architecture (ISCA 20), 1993, pages 257-266.
- Compiler
Synthesized
Dynamic Branch Prediction.
Mahlke, S.A. and Natarajan, B.
Proceedings of the 29th Annual International Symposium on
Microarchitecture
(MICRO 29), 1996, pages 153-135.
- Analysis
of
Branch Prediction Via Data Compression.
Chen, I.-C.K., Coffey, J.T., and Mudge, T.N.
Proceedings of the Seventh ACM Conference on Architectural Support
for Programming Languages and Operating Systems (ASPLOS VII), 1996,
pages
128-137.
- Target
Prediction
for Indirect Jumps.
Chang, P.-Y., Hao, E., and Patt, Y.N.
Proceedings of the 24th Annual International Symposium on Computer
Architecture (ISCA 24), 1997.
- Path-Based
Next Trace Prediction.
Jacobson, Q., Rotenberg, E., and Smith, J.E.
Proceedings of the 30th Annual International Symposium on
Microarchitecture
(MICRO 30), 1997.
2. VLIW Architectures
-
An Architectural Overview of the Programmable Multimedia Processor,
TM-1.
Rathnam, S. and Slavenburg, G.
CompCon '96 conference proceedings, 1996.
- HPL PlayDoh Architecture Specification: Version 1.0.
Kathail, V., Schlansker, M., and Rau, B.
Technical Report HPL-93-80, Hewlett Packard Computer Systems
Laboratory,
Palo Alto, CA., 1994.
- The VelociTI Architecture of the TMS320C6x.
Truong, L.
Presentation HotChips IX, Stanford, 1997
3. Superscalar Architectures
-
The MIPS R10000 Superscalar Microprocessor.
Yeager, K.
IEEE Micro, April 1996, pages 28-40
- A 600MHz Superscalar RISC Microprocessor with Out-of-Order
Execution.
Gieseke, B. and others
IEEE International Solid-State CIrcuits Conference (ISSCC), 1996.
- Advanced
Performance Features of the 64-bit PA-8000.
Hunt, D.
CompCon '95 conference proceedings, 1995.
4. Inter Basic Block Scheduling
-
Trace Scheduling: A Technique for Global Microcode Compaction.
Fisher, J.A.
IEEE Transactions on Computers, 1981, C-30(7):478-490.
- The
Superblock:
An Effective Technique for VLIW and Superscalar Compilation.
Hwu, W.W., Mahlke, S.A., Chen, W.Y., and others
Journal of Supercomputing, 1993, 7(1/2):229-248.
- Global instruction scheduling for superscalar machines.
Bernstein, D. and Rodeh, M.
Proceedings of the ACM SIGPLAN 1991 conference on Programming Language
Design and Implementation (PLDI), 1991, pages 241-255.
5. Software Pipelining
-
Iterative Modulo Scheduling.
Rau, B.R.
International Journal of Parallel Programming, 1996, 24(1):3-64.
- Modulo
Scheduling
of Loops in Control-Intensive Non-Numeric Programs.
Lavery, D.M. and Hwu, W.W.
Proceedings of the 29th Annual International Symposium on
Microarchitecture
(MICRO 29), 1996, pages 126-137.
- An efficient resource-constrained global scheduling technique for
superscalar
and VLIW processors.
Moon, S. and Ebcioglu, K.
Proceedings of the 25th Annual International Symposium on
Microarchitecture
(MICRO 25), 1992, pages 55-71.
6. Register Allocation
-
Register
Allocation with Instruction Scheduling: a New Approach.
Pinter, S.S.
Proceedings of the ACM SIGPLAN 1993 conference on Programming Language
Design and Implementation (PLDI), 1993, pages 248-257.
- The interaction of compilation technology and computer
architecture.
Lilja, D.J. and Bird, P.L.
Kluwer Academic Publishers.
- Exploiting
Dead Value Information.
Martin, M.M., Roth, A., and Fischer, C.N.
Proceedings of the 30th Annual International Symposium on
Microarchitecture
(MICRO 30), 1997, pages 125-135.
7. Speculative Execution
-
Value
Locality and Load Value Prediction.
Lipasti, M.H., Wilkerson, C.B., and Shen, J.P.
Proceedings of the Seventh ACM Conference on Architectural Support
for Programming Languages and Operating Systems (ASPLOS VII), 1996.
- Exceeding
the Dataflow Limit via Value Prediction.
Lipasti, M.H. and Shen, J.P.
Proceedings of the 29th Annual International Symposium on
Microarchitecture
(MICRO 29), 1996, pages 226-237.
- Streamlining
Inter-Operation Memory Communication via Data Dependence Prediction.
Moshovos, A. and Sohi, G.S.
Proceedings of the 30th Annual International Symposium on
Microarchitecture
(MICRO 30), 1997, pages 235-245.
- The
Predictability
of Data Values.
Sazeides, Y. and Smith, J.E.
Proceedings of the 30th Annual International Symposium on
Microarchitecture
(MICRO 30), 1997, pages 248-258.
- Value
Profiling.
Calder, B., Feller, P., and Eustace, A.
Proceedings of the 30th Annual International Symposium on
Microarchitecture
(MICRO 30), 1997, pages 259-269.
- Delayed
Exceptions
- Speculative Execution of Trapping Instructions.
Ertl, M.A. and Krall, A.
Lecture Notes in Computer Science 786, Compiler Construction, 1994,
pages 158-171.
8. Caches and Prefetching
-
Trace
Cache:
a Low Latency Approach to High Bandwidth Instruction Fetching.
Rotenberg, E., Bennett, S., and Smith, J.E.
Proceedings of the 29th Annual International Symposium on
Microarchitecture
(MICRO 29), 1996, pages 24-34.
- Memory
Bandwidth
Limitations of Future Microprocessors.
Burger, D., Goodman, J.R., and Kagi, A.
Proceedings of the 23rd Annual International Symposium on Computer
Architecture (ISCA 23), 1996, pages 78-89.
- Static
Locality Analysis for Cache Management.
Sanches, F.J., Gonzales, A., and Valero, M.
Proceedings of the International Conference on Parallel Architectures
and Compilation Techniques (PACT 97), 1997.
9. Multithreaded Architectures
-
Multiscalar
Processors.
Sohi, G.S., Breach, S.E., and Vijaykumar, T.
Proceedings of the 22nd Annual International Symposium on Computer
Architecture (ISCA 22), 1995, pages 414-425.
- Simultaneous
Multithreading: maximizing On-Chip Parallelism.
Tullsen, D.M., Eggers, S.J., and Levy, H.M.
Proceedings of the 22nd Annual International Symposium on Computer
Architecture (ISCA 22), 1995, pages 392-403.
- Trace
Processors.
Rotenberg, E., Jacobson, Q., Sazeides, Y., and Smith, J.
Proceedings of the 30th Annual International Symposium on
Microarchitecture
(MICRO 30), 1997, pages 138-148.
- The
Multicluster
Architecture: Reducing Cycle Time Through Partitioning.
Farkas, K.I., Chow, P., Jouppi, N.P., and Vranesic, Z.
Proceedings of the 30th Annual International Symposium on
Microarchitecture
(MICRO 30), 1997, pages 149-159.
10. Java Processors and JIT Scheduling
-
Java
Bytecode
to Native Code Translation: The Caffeine Prototype and Preliminary
Results.
Hsieh, C.-H.A., Gyllenhaal, J.C., and Hwu, W.W.
Proceedings of the 29th Annual International Symposium on
Microarchitecture
(MICRO 29), 1996, pages 90-97.
- The
Delft-Java
Engine: An Introduction.
Glossner, C.J. and Vassiliadis, S.
Third International. Euro-Par Conference, 1997, pages 766--770.
- A
JAVA
ILP Machine Based on Fast Dynamic Compilation.
Ebcioglu, K., Altman, E., and Hokenek, E.
IEEE MASCOTS International Workshop on Security and Efficiency Aspects
of Java, 1997.
11. Predication and If-Conversion
-
Effective
compiler support for predicated execution using the hyperblock.
Mahlke, S.A., Lin, D.C., Chen, W.Y., Hank, R.E., and Bringmann, R.A.
Proceedings of the 25th Annual International Symposium on
Microarchitecture
(MICRO 25), 1992.
- Global
Predicate
Analysis and its Application to Register Allocation.
Gillies D.M., Ju D.R., Johnson, R., and Schlansker, M.
Proceedings of the 29th Annual International Symposium on
Microarchitecture
(MICRO 29), 1996, pages 114-125.
- Using
Predicated
Execution to Improve the Performance of a Dynamically Scheduled Machine
with Speculative Execution.
Chang, P.Y., Hao, E., Patt, Y.N., and Chang, P.P.
International Journal of Parallel Programming, 1996, 24(3).
- Analysis
Techniques for Predicated Code.
Johnson, R. and Schlansker, M.
Proceedings of the 29th Annual International Symposium on
Microarchitecture
(MICRO 29), 1996, pages 100-113.
- Reverse
If-Conversion.
Warter, N.J., Mahlke, S.A., Hwu, W.W., and Rau, B.R.
Proceedings of the ACM SIGPLAN'93 Conference on Programming Language
Design and Implementation (PLDI), 1993, pages 290-299.
- A
Framework
for Balancing Control Flow and Predication.
August, D.I., Hwu, W.W., and Mahlke, S.A.
Proceedings of the 30th Annual International Symposium on
Microarchitecture
(MICRO 30), 1997.
12. Register file design
13. Compatability
14. IRAM
-
Scalable
Processors for the Billion Transistor Era: IRAM.
Kozyrakis, C.E., Perissakis, S., Patterson, D., and others.
IEEE Computer, September 1997, pages 75-78.
- A
case
for intelligent RAM.
Patterson, D., Anderson, T., and others
IEEE Micro, April 1997, pages 34-44.
- Missing
the Memory Wall: The Case for Processor/Memory Integration.
Saulsbury, A., Pong, F., and Nowatzyk, A.
Proceedings of the 23th Annual International Symposium on Computer
Architecture (ISCA 23), 1996.
15. Recent architectures and implementations, eg:
16. Code compression
Code compression is extremely important for embedded
systems
(which have to be very cheap).
See http://www.win.tue.nl/~rikvdw/bibl.html
for an extensive literature list.
17. Cache Coherency and Memory Consistency
In Multiprocessor systems it is important to keep caches
coherent
with memory and eachother. Furthermore, different memory consistency
models
may relax the constraints on the execution order of memory operations.