next up previous
Next: About this document Up: Literature overview considering Instruction Previous: Miscellaneous topics

References

Abnous and Bagherzadeh, 1994
Abnous, A. and Bagherzadeh, N. (1994). Pipelining and Bypassing in a VLIW Processor. IEEE transactions on Parallel and Distributed Systems, 5(6):658-664.

Abraham et al., 1996
Abraham, S., Kathail, V., and Deitrich, B. (1996). Meld Scheduling: Relaxing Scheduling Constraints across Region Boundaries. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 308-321, Paris, France.

Aho et al., 1985
Aho, A. V., Sethi, R., and Ullman, J. D. (1985). Compilers: Principles, Techniques and Tools. Addison-Wesley Series in Computer Science. Addison-Wesley Publishing Company, Reading, Massachusetts.

Aiken and Nicolau, 1988
Aiken, A. and Nicolau, A. (1988). Optimal Loop Parallelization. In Proceedings of the SIGPLAN'88 conference on Programming Language Design and Implementation, pages 308-317, Atlanta, Georgia.

Arnold and Corporaal, 1997
Arnold, M. and Corporaal, H. (1997). Data Transport Reduction in Move Processors. In Third Annual Conference of ASCI, The Netherlands.

Austin et al., 1995
Austin, T. M., Pnevmatikatos, D. N., and Sohi, G. S. (1995). Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 82-92, Michigan.

Ball and Larus, 1993
Ball, T. and Larus, J. (1993). Branch Prediction for Free. In Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation, pages 300-313.

Ball and Larus, 1996
Ball, T. and Larus, J. (1996). Efficient Path Profiling. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 46-57, Paris, France.

Beaty, 1991
Beaty, S. J. (1991). Genetic Algorithms and Instruction Scheduling. In Proceedings of the 24th Annual International Symposium on Microarchitecture, pages 206-211, Albuquerque.

Beckmann, 1994
Beckmann, C. J. (1994). Hardware and Software for Functional and Fine Grain Parallellism. PhD thesis, University of Illinois at Urbana-Champaign, Centre of Supercompter Research and Development.

Bernstein and Rodeh, 1991
Bernstein, D. and Rodeh, M. (1991). Global instruction scheduling for superscalar machines. In Proceedings of the ACM SIGPLAN 1991 conference on Programming Language Design and Implementation, pages 241-255.

Briggs, 1992
Briggs, P. (1992). Register Allocation via Graph Coloring. PhD thesis, Rice University.

Brownhil et al., 1997
Brownhil, C., Nicolau, A., Novack, S., and Polychronopoulos, C. (1997). The PROMIS Compiler Prototype. In Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques, San Francisco, USA.

Burger et al., 1996a
Burger, D., Goodman, J. R., and Kägi, A. (1996a). Memory Bandwidth Limitations of Future Microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 78-89, Philadelphia, Pennsylvania. ACM SIGARCH and IEEE Computer Society TCCA.

Burger et al., 1996b
Burger, D., Kaxiras, S., and Goodman, J. R. (1996b). DataScalar Architectures. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, Philadelphia, Pennsylvania. ACM SIGARCH and IEEE Computer Society TCCA.

Burger et al., 1996c
Burger, D., Kaxiras, S., and Goodman, J. R. (1996c). DataScalar Architectures and the SPSD Execution Model. Technical Report TR 1317, University of Wisconsin-Madison Computer Sciences Department.

Calder et al., 1997
Calder, B., Feller, P., and Eustace, A. (1997). Value Profiling. In [MICRO30, 1997], pages 259-269.

Calder and Grunwald, 1995
Calder, B. and Grunwald, D. (1995). Next Cache Line and Set Prediction. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 287-296, Santa Margherita Ligure, Italy.

Capitanio et al., 1992
Capitanio, A., Dutt, N., and Nicolau, A. (1992). Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 292-300, Portland.

Chang et al., 1995
Chang, P.-Y., Hao, E., and Patt, Y. (1995). Alternative Implementations of Hybrid Branch Predictors. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 252-257, Michigan.

Chang et al., 1997
Chang, P.-Y., Hao, E., and Patt, Y. N. (1997). Target Prediction for Indirect Jumps. In [ISCA24, 1997].

Chang et al., 1996
Chang, P.-Y., Hao, E., Patt, Y. N., and Chang, P. P. (1996). Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution. International Journal of Parallel Programming, 24(3).

Chekuri et al., 1996
Chekuri, C. et al. (1996). Profile-Driven Instruction Level Parallel Scheduling with Application to Super Blocks. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 58-67, Paris, France.

Chen et al., 1996
Chen, I.-C. K., Coffey, J. T., and Mudge, T. N. (1996). Analysis of Branch Prediction Via Data Compression. In ASPLOS VII, pages 128-137, Cambridge, Massachusetts.

Chen et al., 1994
Chen, W., Mahlke, S., Warter, N., Anik, S., and Hwu, W. (1994). Profile assisted instruction scheduling. Int. J. of Parallel Programming, 22(2):151-181.

Clark, 1987
Clark, D. W. (1987). Pipelining and Performance in the VAX 8800 Processor. In Proceedings of ASPLOS-II, pages 173-177, Palo Alto, California.

Cogswell, 1995
Cogswell, B. H. (1995). Timing insensitive binary-to-binary translation. PhD thesis, Carnegie Mellon University.

Colwell et al., 1987
Colwell, R. P., Nix, R. P., O'Donnell, J. J., Papworth, D. B., and Rodman, P. K. (1987). A VLIW Architecture for a Trace Scheduling Compiler. In Proceedings of the Second International Conference on Architectural Support for Programming Languages and Operating Systems, pages pages 180-192. ACM. SIGPLAN Notices Vol. 22, No. 10.

Conte et al., 1996
Conte, T. M. et al. (1996). Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 201-211, Paris, France.

Conte and Sathaye, 1995
Conte, T. M. and Sathaye, S. W. (1995). Dynamic Rescheduling: A Technique for Object Code Compatibility in VLIW Architectures. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 208-218, Ann Arbor, Michigan.

Corporaal, 1997
Corporaal, H. (1997). Microprocessor Architectures; from VLIW to TTA. John Wiley. ISBN 0-471-97157-X.

Dehnert and Towle, 1993
Dehnert, J. C. and Towle, R. A. (1993). Compiling for the Cydra 5. The Journal of Supercomputing, 7(1/2):181-228.

Diep et al., 1995
Diep, T. A., Nelson, C., and Shen, J. P. (1995). Performance Evaluation of the PowerPC 620 Microarchitecture. In The 22nd Annual International Symposium on Computer Architecture, pages 163-174.

Dubey, 1997
Dubey, P. K. (1997). Architectural and Design Implications of Mediaprocessing. Tutorial, Hot Chips IX, Stanford.

Dubey et al., 1995
Dubey, P. K., O'Brien, K., O'Brien, K., and Barton, C. (1995). Single-Program Speculative Multithreading (SPSM) Architecture: Compiler-Assisted Fine-Grained Multithreading. In International Conference on Parallel Architectures and Compilation Techniques, pages 109-121.

Dunn and Hsu, 1996
Dunn, D. A. and Hsu, W.-C. (1996). Instruction Scheduling for the HP PA-8000. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 298-307, Paris, France.

Dwyer and Torng, 1992
Dwyer, H. and Torng, H. (1992). An Out-of-Order Superscalar Processor with Speculative Execution and Fast, Precise Interrupts. In Proceedings of the 25th Annual International Workshop on Microprogramming, pages 272-281, Portland, Oregon.

Ebcioğlu, 1987
Ebcioğlu, K. (1987). A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. In Proceedings of the 20th Annual Workshop on Microprogramming.

Ebcioğlu and Altman, 1997
Ebcioğlu, K. and Altman, E. (1997). DAISY: Dynamic Compilation for 100% Architectural Compatibility. In Proceedings of the 24th Annual International Symposium on Computer Architecture, Denver, Colorado.

Ebcioğlu et al., 1997
Ebcioğlu, K., Altman, E., and Hokenek, E. (1997). A JAVA ILP Machine Based on Fast Dynamic Compilation. In IEEE MASCOTS International Workshop on Security and Efficiency Aspects of Java, Eilat, Israel.

Ebcioğlu et al., 1994
Ebcioğlu, K., Groves, R., Kim, K., Silberman, G., and Ziv, I. (1994). VLIW Compilation Techniques in a Superscalar Environment. ACM SIGPLAN Notices, (PLDI'94), 29(6):36-48.

Ebcioğlu and Nakatani, 1989
Ebcioğlu, K. and Nakatani, T. (1989). A New Compilation Technique for Parallelizing Loops with Unpredictable Branches on a VLIW Architecture. In Proceedings of the Second Workshop on Programming Languages and Compilers for Parallel Computing, University of Illinois at Urbana-Champaign.

Eickemeyer and Vassiliadis, 1993
Eickemeyer, R. J. and Vassiliadis, S. (1993). A load-instruction unit for pipelined processors. IBM Journal of Research and Development, 37(4):547-564.

Ellis, 1986
Ellis, J. R. (1986). Bulldog: A Compiler for VLIW Architectures. ACM Doctoral Dissertation Awards. MIT Press, Cambridge, Massachusetts.

Emer and Gloy, 1997
Emer, J. and Gloy, N. (1997). A language for describing predictors and its application to automatic synthesis. In [ISCA24, 1997].

Ertl and Krall, 1994
Ertl, M. A. and Krall, A. (1994). Delayed Exceptions - Speculative Execution of Trapping Instructions. In Lecture Notes in Computer Science 786, Compiler Construction, pages 158-171. Springer-Verlag.

et al, 1981
et al, G. J. C. (1981). Register Allocation via Coloring. Computer Languages, 6:47-57.

Farkas et al., 1997a
Farkas, K. I., Chow, P., Jouppi, N. P., and Vranesic, Z. (1997a). The Multicluster Architecture: Reducing Cycle Time Through Partitioning. In [MICRO30, 1997], pages 149-159.

Farkas et al., 1997b
Farkas, K. I., Jouppi, N. P., and Chow, P. (1997b). Register File Design Considerations in Dynamically Scheduled Processors. Technical Report Research Report 95/10, Digital Western Research Laboratory.

Fisher et al., 1996
Fisher, J., Faraboschi, P., and Desoli, G. (1996). Custom-Fit Processors: Letting Applications Define Architectures. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 324-335, Paris, France.

Fisher, 1981
Fisher, J. A. (1981). Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, C-30(7):478-490.

Fisher and Freudenberger, 1992
Fisher, J. A. and Freudenberger, S. M. (1992). Predicting conditional branch directions from previous runs of a program. In Proceedings of ASPLOS-V, pages 85-97, Boston.

Foley, 1996
Foley, P. (1996). The Mpacttex2html_wrap_inline1244 Media Processor Redefines the Multimedia PC. In CompCon '96 conference proceedings, Santa Clara.

Gabbay and Mendelson, 1997
Gabbay, F. and Mendelson, A. (1997). Can Program Profiling Support Value Prediction? In [MICRO30, 1997], pages 270-280.

Gibbons and Muchnick, 1986
Gibbons, P. B. and Muchnick, S. S. (1986). Efficient Instruction Scheduling for a Pipelined Architecture. In Proceedings of the SIGPLAN Symposium on Compiler Construction, pages 11-16.

Gieseke, 1997
Gieseke, B. (1997). A 600MHz Superscalar RISC Microprocessor with Out-of-Order Execution. In IEEE International Solid-State Circuits Conference.

Gillies et al., 1996
Gillies, D. M. et al. (1996). Global Predicate Analysis and its Application to Register Allocation. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 114-125, Paris, France.

Girkar and Polychronopoulos, 1994
Girkar, M. and Polychronopoulos, C. D. (1994). The Hierarchical Task Graph as a Universal Intermediate Representation. International Journal of Parallel Programming, 22(5):519-551.

Glossner and Vassiliadis, 1997
Glossner, C. J. and Vassiliadis, S. (1997). The DELFT-JAVA Engine: An Introduction. In Third int. Euro-Par Conference, pages 766-770, Pasau, Germany.

Gloy et al., 1996
Gloy, N., Young, C., Chen, J. B., and Smith, M. D. (1996). An Analysis of Dynamic Branch Prediction Schemes on System Workloads. In ISCA-23.

González et al., 1997
González, A., Valero, M., Topham, N., and Parcerisa, J. M. (1997). Eliminating Cache Conflict Misses Through XOR-Based Placement Functions. In Proc. of the ACM Int. Conf. on Supercomputing, pages 76-83, Vienna, Austria.

Granlund and Kenner, 1992
Granlund and Kenner (1992). Eliminating Branches using a Superoptimizer and the GNU C Compiler. In ACM SIGPLAN, pages 341-352.

Grunwald et al., 1995
Grunwald, D. et al. (1995). Corpus-Based Static Branch Prediction. In Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation.

Hank et al., 1995
Hank, R. E., Hwu, W. W., and Rau, B. R. (1995). Region-based compilation: an introduction and motivation. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 158-168, Michigan.

Hennessy and Patterson, 1996
Hennessy, J. L. and Patterson, D. A. (1996). Computer Architecture, a Quantitative Approach, Second Edition. Morgan Kaufmann publishers.

Holler, 1996
Holler, A. M. (1996). Optimization for a Superscalar Out-of-Order Machine. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 336-348, Paris, France.

Hoogerbrugge, 1996
Hoogerbrugge, J. (1996). Code generation for Transport Triggered Architectures. PhD thesis, Delft Univ. of Technology. ISBN 90-9009002-9.

Hordijk and Corporaal, 1997a
Hordijk, J. and Corporaal, H. (1997a). A Comparison of Different Multithreading Architectures. Technical Report 1-68340-44(1997)11, Department of Electrical Engineering, Delft University of Technology.

Hordijk and Corporaal, 1997b
Hordijk, J. and Corporaal, H. (1997b). The Potential of Exploiting Coarse-Grain Task Parallelism from Sequential Programs. In HPCN Europe '97, The International Conference and Exhibition on High-Performance Computing and Networking, Vienna, Austria.

Hsieh et al., 1996
Hsieh, C.-H. A., Gyllenhaal, J. C., and Hwu, W. W. (1996). Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary Results. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 90-97, Paris, France.

Hsu and Davidson, 1986
Hsu, P. Y. T. and Davidson, E. S. (1986). Higly Concurrent Scalar Processing. In Proceedings of ISCA-13, pages 386-395.

Huff, 1993
Huff, R. A. (1993). Lifetime-Sensitive Modulo Scheduling. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 258-267.

Hunt, 1995
Hunt, D. (1995). Advanced Performance Features of the 64-bit PA-8000. In COMPCON 1995 Digest of Papers, pages 123-128.

Hwu et al., 1993
Hwu, W. W. et al. (1993). The Superblock: An Effective Technique for VLIW and Superscalar Compilation. The Journal of Supercomputing, 7(1/2):229-248.

Hwu and Patt, 1987
Hwu, W. W. and Patt, Y. N. (1987). Checkpoint Repair for High-Performance Out-of-Order Execution Machines. Transactions on Computers, C-36(12).

ISCA24, 1997
ISCA24 (1997). Proceedings of the 24th Annual International Symposium on Computer Architecture, Denver, Colorado. ACM SIGARCH and IEEE Computer Society TCCA.

Jacobson et al., 1997
Jacobson, Q., Rotenberg, E., and Smith, J. E. (1997). Path-Based Next Trace Prediction. In [MICRO30, 1997], pages 14-23.

Janssen and Corporaal, 1995
Janssen, J. and Corporaal, H. (1995). Partitioned Register Files for TTAs. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 303-312, Michigan.

Janssen and Corporaal, 1997
Janssen, J. and Corporaal, H. (1997). Registers On Demand, an integrated region scheduler and register allocator. In Submitted paper.

J.L. Lo et al., 1997
J.L. Lo, S. E., Emer, J., Levy, H., Stamm, R., and Tullsen, D. (1997). Converting Thread-Level Parallelism Into Instruction-Level Parallelism via Simultaneous Multithreading. ACM Transactions on Computer Systems.

Johnson and Schlansker, 1996
Johnson, R. and Schlansker, M. (1996). Analysis Techniques for Predicated Code. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 100-113, Paris, France.

Johnson, 1991
Johnson, W. M. (1991). Superscalar Microprocessor Design. Prentice Hall.

Jouppi and Wall, 1989
Jouppi, N. P. and Wall, D. W. (1989). Available Instruction-Level Parallelism for Superscalar and Superpipelined Machines. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 272-282.

Karkowski and Corporaal, 1997
Karkowski, I. and Corporaal, H. (1997). Overcoming the Limitations of the Traditional Loop Parallelization. In HPCN Europe '97, The International Conference and Exhibition on High-Performance Computing and Networking, Vienna, Austria.

Kathail et al., 1994
Kathail, V., Schlansker, M., and Rau, B. (1994). HPL PlayDoh Architecture Specification: Version 1.0. Technical Report HPL-93-80, Hewlett Packard Computer Systems Laboratory, Palo Alto, CA.

Kunkel and Smith, 1986
Kunkel, S. R. and Smith, J. E. (1986). Optimal Pipelining in Supercomputers. In ISCA-13, pages 404-414, Tokyo, Japan.

Lam, 1988
Lam, M. (1988). Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation, pages 318-328.

Lam and Wilson, 1992
Lam, M. S. and Wilson, R. P. (1992). Limits of control flow on parallelism. In ISCA-19, pages 46-57, Australia.

Larus, 1990
Larus, J. R. (1990). Parallelism in Numeric and Symbolic Programs. In Proceedings of International Workshop on Compilers for Parallel Computers, pages 157-170.

Lavery and Hwu, 1996
Lavery, D. M. and Hwu, W. W. (1996). Modulo Scheduling of Loops in Control-Intensive Non-Numeric Programs. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 126-137, Paris, France.

Lee et al., 1997a
Lee, C.-C., Chen, I.-C. K., and Mudge, T. N. (1997a). The Bi-Mode Branch Predictor. In [MICRO30, 1997], pages 4-13.

Lee et al., 1995
Lee, D., Baer, J.-L., Calder, B., and Grunwald, D. (1995). Instruction Cache Fetch Policies for Speculative Execution. In ISCA-22 proceedings, pages 357-367, Santa Margherita Ligure, Italy.

Lee and Smith, 1984
Lee, J. and Smith, A. (1984). Branch Prediction Strategies and Branch Target Buffer Design. In IEEE Computer, pages 6-22.

Lee et al., 1997b
Lee, R. B. et al. (1997b). MAX-2 Multimedia Extensions for PA-RISC 2.0 Processors. Presentation, Hot Chips IX, Stanford.

Liao, 1996
Liao, S. Y.-H. (1996). Code Generation and Optimization for Embedded Digital Signal Processors. PhD thesis, MIT.

Lilja and Bird, 1994
Lilja, D. J. and Bird, P. L. (1994). The interaction of compilation technology and computer architecture. Kluwer Academic Publishers.

Lipasti and Shen, 1996
Lipasti, M. H. and Shen, J. P. (1996). Exceeding the Dataflow Limit via Value Prediction. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 226-237, Paris, France.

Lipasti et al., 1996
Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. (1996). Value Locality and Load Value Prediction. In Proceedings of the Seventh ACM Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, Massachusetts.

Lowney et al., 1993
Lowney, P. G. et al. (1993). The Multiflow Trace Scheduling Compiler. The Journal of Supercomputing, 7(1/2):51-142.

Luk and Mowry, 1996
Luk, C.-K. and Mowry, T. C. (1996). Compiler-Based Prefetching for Recursive Data Structures. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 222-233, Cambridge, Massachusetts.

Mahadevan and Ramakrishnan, 1994
Mahadevan, U. and Ramakrishnan, S. (1994). Instruction Scheduling over Regions: A Framework for Scheduling Across Basic Blocks. In Proceedings of the International Conference on Compiler Construction, pages 419-434, Edinburgh, Scotland.

Mahlke et al., 1992a
Mahlke, S. A. et al. (1992a). Sentinel scheduling for VLIW and Superscalar Processors. In Proceedings of ASPLOS-V, pages 238-247, Boston.

Mahlke et al., 1995
Mahlke, S. A. et al. (1995). A Comparison of Full and Partial Predicated Execution Support for ILP Processors. In The 22nd Annual International Symposium on Computer Architecture, pages 138-149.

Mahlke et al., 1992b
Mahlke, S. A., Lin, D. C., Chen, W. Y., Hank, R. E., and Bringmann, R. A. (1992b). Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture.

Mahlke and Natarajan, 1996
Mahlke, S. A. and Natarajan, B. (1996). Compiler Synthesized Dynamic Branch Prediction. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 153-164.

Martin et al., 1997
Martin, M. M., Roth, A., and Fischer, C. N. (1997). Exploiting Dead Value Information. In [MICRO30, 1997], pages 125-135.

Maydan et al., 1995
Maydan, D. E., Hennessy, J. L., and Lam, M. S. (1995). Effectiveness of Data Dependence Analysis. International Journal of Parallel Programming, 23(1).

McFarling, 1993
McFarling, S. (1993). Combining Branch Predictors. WRL Technical Note TN-36.

McFarling and Hennessy, 1986
McFarling, S. and Hennessy, J. (1986). Reducing the Cost of Branches. In Proc. 13th Ann. Int'l Symp. on Computer Architecture, pages 396-403.

MICRO30, 1997
MICRO30 (1997). Proceedings of the 30th Annual International Symposium on Microarchitecture, Research Triangle Park, North Carolina. IEEE Computer Society TC-MICRO and ACM SIGMICRO.

Moon and Ebcioğlu, 1992
Moon, S. and Ebcioğlu, K. (1992). An efficient resource-constrained global scheduling technique for superscalar and VLIW processors. In Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland.

Moshovos et al., 1997
Moshovos, A., Breach, S. E., Vijaykumar, T., and Sohi, G. S. (1997). Dynamic Speculation and Synchronization of Data Dependences. In [ISCA24, 1997].

Moshovos and Sohi, 1997
Moshovos, A. and Sohi, G. S. (1997). Streamlining Inter-Operation Memory Communication via Data Dependence Prediction. In [MICRO30, 1997], pages 235-245.

Moudgill, 1994
Moudgill, M. (1994). Implementing and exploiting static speculation on multiple instruction issue processors. PhD thesis, Cornell University.

Moudgill and Vassiliadis, 1996
Moudgill, M. and Vassiliadis, S. (1996). On Precise Interrupts. IEEE Micro, pages 58-67.

Muchnick, 1997
Muchnick, S. (1997). Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers. ISBN 1-55860-320-4.

Nair, 1995
Nair, R. (1995). Dynamic Path-Based Branch Correlation. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 15-23, Michigan.

Nicolau, 1985
Nicolau, A. (1985). Percolation Scheduling: A Parallel Compilation Technique. Technical Report TR 85-678, Cornell University, Department of Computer Science, Cornell University, Ithaca, NY 14853, USA.

Nicolau, 1989
Nicolau, A. (1989). Run-TIme disambiguation: Coping with statically unpredictable dependencies. IEEE Transactions on Computers, 38(5).

Nicolau and Fisher, 1984
Nicolau, A. and Fisher, J. (1984). Measuring the Parallelism available for very long instruction word architectures. computers, C33(11):968-976.

Normyle and Csoppenszky, 1997
Normyle, K. and Csoppenszky, M. (1997). UltraSPARCtex2html_wrap_inline1244 IIi - A highly integrated 300 Mhz 64-bit SPARC V9 CPU. Presentation, Hot Chips IX, Stanford.

Novack et al., 1995a
Novack, S., Hummel, J., and Nicolau, A. (1995a). A simple mechanism for improving the accuracy and efficiency of instruction level disambiguation. Lecture Notes in Computer Science 1033, pages 289-303.

Novack et al., 1995b
Novack, S., Nicolau, A., and Dutt, N. (1995b). A Unified Code Generation Approach Using Mutation Scheduling. In Code Generation for Embedded Processors, page Chapter 12. Kluwer Academic Publishers.

Palacharla et al., 1997
Palacharla, S., Jouppi, N. P., and Smith, J. E. (1997). Complexity-Effective Superscalar Processors. In [ISCA24, 1997].

Pan et al., 1992
Pan, S.-T. et al. (1992). Improving the accuracy of dynamic branch prediction using branch correlation. In Proceedings of ASPLOS-V, pages 76-84, Boston.

Park and Schlansker, 1991
Park, J. C. H. and Schlansker, M. (1991). On predicated execution. Technical Report HPL-91-58, HP Laboratories, Palo Alto.

Patterson et al., 1997
Patterson, D. et al. (1997). A case for intelligent RAM. IEEE micro, pages 34-44.

Patterson, 1995
Patterson, J. (1995). Accurate Static Branch Prediction by Value Range Propagation. In Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation.

Paulin and Knight, 1989
Paulin, P. G. and Knight, J. P. (1989). Force-Directed Scheduling for the Behavioral Synthesis of ASIC's. IEEE trans. on computer-aided design, 8(6):661-679.

Pearl, 1984
Pearl, J. (1984). Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley.

PicoJava, 1996
PicoJava (1996). Picojavatex2html_wrap_inline1244 I Microprocessor Core Architecture. SUN white paper on Web-site: http://www.sun.com/sparc/whitepapers/wpr-0014-01.

Pinter, 1993
Pinter, S. S. (1993). Register Allocation with Instruction Scheduling: a New Approach. In SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 248-257.

Ramakrishnan, 1992
Ramakrishnan, S. (1992). Software Pipelining in PA-RISC Compilers. Hewlett-Packard Journal, pages 39-45.

Rathnam and Slavenburg, 1996
Rathnam, S. and Slavenburg, G. (1996). An Architectural Overview of the Programmable Multimedia Processor, TM-1. In CompCon '96 conference proceedings, Santa Clara.

Rau, 1993
Rau, B. R. (1993). Dynamically Scheduled VLIW Processors. In Proceedings of the 26th Annual International Symposium on Microarchitecture, pages 80-92, Austin, Texas.

Rau, 1994
Rau, B. R. (1994). Iterative Modulo Scheduling: An Algorithm For Software Pipelining Loops. In Proceedings of the 27th Annual International Workshop on Microprogramming, San Jose, California.

Rau, 1996
Rau, B. R. (1996). Iterative Modulo Scheduling. Int. J. of Parallel Programming, 24(1):3-64.

Rau and Fisher, 1993
Rau, B. R. and Fisher, J. A. (1993). Instruction-Level Parallel Processing: History, Overview and Perspective. The Journal of Supercomputing, 7(1/2):9-50.

Rau and Glaeser, 1981
Rau, B. R. and Glaeser, C. D. (1981). Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. In Proceedings of the 14th Annual Workshop on Microprogramming, pages 183-198.

Rivers et al., 1997
Rivers, J. A., Tyson, G. S., Davidson, E. S., and Austin, T. M. (1997). On High-Bandwidth Data Cache Design for Multi-Issue Processors. In [MICRO30, 1997], pages 46-56.

Rompaey et al., 1992
Rompaey, K. V., Bolsens, I., and Man, H. D. (1992). Just in time scheduling. In ICCD-92, pages 295-300, Boston.

Rotenberg et al., 1996
Rotenberg, E., Bennett, S., and Smith, J. E. (1996). Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 24-34, Paris, France.

Rotenberg et al., 1997
Rotenberg, E., Jacobson, Q., Sazeides, Y., and Smith, J. (1997). Trace Processors. In [MICRO30, 1997], pages 138-148.

Sánchez and González, 1997
Sánchez, F. J. and González, A. (1997). Cache Sensitive Modulo Scheduling. In [MICRO30, 1997], pages 338-348.

Saulsbury et al., 1996
Saulsbury, A., Pong, F., and Nowatzyk, A. (1996). Missing the Memory Wall: The Case for Processor/Memory Integration. In ISCA-23.

Sazeides and Smith, 1997
Sazeides, Y. and Smith, J. E. (1997). The Predictability of Data Values. In [MICRO30, 1997], pages 248-258.

Sazeides et al., 1996
Sazeides, Y., Vassiliadis, S., and Smith, J. E. (1996). The Performance Potential of Data Dependence Speculation & Collapsing. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 238-247, Paris, France.

Schlansker and Kathail, 1995
Schlansker, M. and Kathail, V. (1995). Critical Path Reduction for Scalar Programs. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 57-69, Michigan.

Schuette and Shen, 1993
Schuette, M. and Shen, J. (1993). Instruction-Level Experimental Evaluation of the Multiflow TRACE 14/300 VLIW Computer. The Journal of Supercomputing, 7(1/2):249.

Seznec et al., 1996
Seznec, A., Jourdan, S., Sainrat, P., and Michaud, P. (1996). Multiple-Block Ahead Branch Predictors. In ASPLOS VII, Cambridge, Massachusetts.

Silberman and Ebcioğlu, 1993
Silberman, G. M. and Ebcioğlu, K. (1993). An architecture framework for supporting heterogeneous instruction-set architectures. IEEE Computer, 26(6):39-56.

Simone et al., 1995
Simone, M. et al. (1995). Implementation Trade-offs in Using a Restricted Data Flow Architecture in a High Performance RISC Microprocessor. In The 22nd Annual International Symposium on Computer Architecture, pages 151-162.

Smith and Sohi, 1995
Smith, J. E. and Sohi, G. S. (1995). The Microarchitecture of Superscalar Processors. Technical report, University of Wisconsin-Madison.

Smith et al., 1992
Smith, M. D., Horowitz, M., and Lam, M. S. (1992). Efficient superscalar performance through boosting. In Proceedings of ASPLOS-V, pages 248-261, Boston.

Smotherman et al., 1991
Smotherman, M. et al. (1991). Efficient DAG Construction and Heuristic Calculation for Instruction Scheduling. In Proceedings of the 24th Annual International Symposium on Microarchitecture, pages 93-102, Albuquerque.

Sánchez et al., 1997
Sánchez, F. J., González, A., and Valero, M. (1997). Static Locality Analysis for Cache Management. In Proc. of Int. Conf. on Parallel Architectures and Compilation Techniques, San Francisco, USA.

Sodani and Sohi, 1997
Sodani, A. and Sohi, G. S. (1997). Dynamic Instruction Reuse. In [ISCA24, 1997].

Sohi et al., 1995
Sohi, G. S., Breach, S. E., and Vijaykumar, T. (1995). Multiscalar Processors. In ISCA'22 proceedings, pages 414-425, Santa Margherita Ligure, Italy.

Stark et al., 1997
Stark, J., Racunas, P., and Patt, Y. N. (1997). Reducing the Performance Impact of Instruction Cache Misses by Writing Instructions into the Reservation Stations Out-of-Order. In [MICRO30, 1997], pages 34-43.

Stok, 1994
Stok, L. (1994). Data path synthesis. INTEGRATION, the VLSI journal, 18(1):1-71.

Su and Wang, 1991
Su, B. and Wang, J. (1991). Loop-Carried Dependence and the General URPR Software Pipelining Approach. In Proceedings of HICSS-24, Vol. 2, pages 366-372.

Sweany and Beaty, 1990
Sweany, P. and Beaty, S. (1990). Post-Compaction Register Assignment in a Retargetable Compiler. In Proceedings of the 23rd Annual Workshop on Microprogramming and Microarchitectures, pages 107-116, Orlando.

Truong, 1997
Truong, L. (1997). The VelociTItex2html_wrap_inline1244 Architecture of the TMS320C6x. Presentation, Hot Chips IX, Stanford.

Tsai and Yew, 1996
Tsai, J.-Y. and Yew, P.-C. (1996). The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. Technical Report TR 96-037, Univ. of Minnesota, Department of Computer Science.

Tullsen et al., 1995
Tullsen, D. M., Eggers, S. J., and Levy, H. M. (1995). Simultaneous Multithreading: maximizing On-Chip Parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392-403, Santa Margherita Ligure, Italy.

Tullsen et al., 1996
Tullsen, D. M. et al. (1996). Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In Proceedings of the 23nd Annual International Symposium on Computer Architecture, Philadelphia, PA.

Uht and Sindagi, 1995
Uht, A. K. and Sindagi, V. (1995). Disjoint Eager Execution: An Optimal Form of Speculative Execution. In Proc. of the 28th Annual International Symposium on Microarchitecture.

Vajapeyam and Mitra, 1997
Vajapeyam, S. and Mitra, T. (1997). Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences. In ISCA-97.

Vassiliadis et al., 1993
Vassiliadis, S., Phillips, J., and Blaner, B. (1993). Interlock Collapsing ALUs. IEEE Transactions on Computers, 42(7):825-839.

Wall, 1991
Wall, D. W. (1991). Limits of Instruction-Level Parallelism. In Proceedings of ASPLOS-IV, pages 176-188, Santa Clara, California. ACM.

Wang and Eisenbeis, 1993
Wang, J. and Eisenbeis, C. (1993). Decomposed Software Pipelining: A New Approach to Exploit Instruction Level Parallelism for Loop Programs. In IFIP WG 10.3 Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, Orlando, Florida.

Wang and Franklin, 1997
Wang, K. and Franklin, M. (1997). Highly Accurate Data Value Prediction using Hybrid Predictors. In [MICRO30, 1997], pages 281-290.

Warter et al., 1993
Warter, N. J. et al. (1993). Reverse If-Conversion. In Proceedings of the ACM SIGPLAN '93 Conference on Program Language Design and Implementation.

Weiss and Smith, 1994
Weiss, S. and Smith, J. E. (1994). POWER and PowerPC. Morgan Kaufmann Publishers.

Wilhelm and Maurer, 1995
Wilhelm, R. and Maurer, D. (1995). Compiler Design. Addison-Wesley.

Wilson et al., 1995
Wilson, R., Franch, R., Wilson, C., Amarasinghe, S., Anderson, J., Tjiang, S., Liao, S.-W., Tseng, C.-W., Hall, M., Lam, M., and Hennessy, J. (1995). An Overview of the SUIF Compiler System. Web-site: http://suif.stanford.edu/suif/suif.html.

Wolf, 1992
Wolf, M. E. (1992). Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, Computer Systems Laboratory, Stanford, CA 94305. also available as technical report CSL-TR-92-538.

Wolfe and Chanin, 1992
Wolfe, A. and Chanin, A. (1992). Executing Compressed Programs on An Embedded RISC Architecture. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 81-91, Portland, Oregon.

Wu and Larus, 1994
Wu, Y. and Larus, J. (1994). Static Branch Frequency and Program Profile Analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 1-11.

Yeager, 1996
Yeager, K. C. (1996). MIPS R10000. IEEE micro.

Yeh and Patt, 1991
Yeh, T. and Patt, Y. (1991). Two-level adaptive training branch prediction. In Proceedings of the 24th Annual International Symposium on Microarchitecture, pages 51-61.

Yeh and Patt, 1992
Yeh, T. and Patt, Y. (1992). Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 124-134.

Yeh and Patt, 1993
Yeh, T. and Patt, Y. (1993). A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 257-266.

Young et al., 1995
Young, C., Gloy, N., and Smith, M. D. (1995). A Comparative Analysis of Schemes for Correlated Branch Prediction. In The 22nd Annual International Symposium on Computer Architecture, pages 276-286.

Zima and Chapman, 1991
Zima, H. and Chapman, B. (1991). Supercompilers for Parallel and Vector Computers. Addison-Wesley.



Henk Corporaal
Tue Mar 10 11:20:49 CET 1998