Next: About this document
Up: Literature overview considering Instruction
Previous: Miscellaneous topics
References
- Abnous and Bagherzadeh, 1994
-
Abnous, A. and Bagherzadeh, N. (1994).
Pipelining and Bypassing in a VLIW Processor.
IEEE transactions on Parallel and Distributed Systems,
5(6):658-664.
- Abraham et al., 1996
-
Abraham, S., Kathail, V., and Deitrich, B. (1996).
Meld Scheduling: Relaxing Scheduling Constraints across Region
Boundaries.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 308-321, Paris, France.
- Aho et al., 1985
-
Aho, A. V., Sethi, R., and Ullman, J. D. (1985).
Compilers: Principles, Techniques and Tools.
Addison-Wesley Series in Computer Science. Addison-Wesley Publishing
Company, Reading, Massachusetts.
- Aiken and Nicolau, 1988
-
Aiken, A. and Nicolau, A. (1988).
Optimal Loop Parallelization.
In Proceedings of the SIGPLAN'88 conference on Programming
Language Design and Implementation, pages 308-317, Atlanta, Georgia.
- Arnold and Corporaal, 1997
-
Arnold, M. and Corporaal, H. (1997).
Data Transport Reduction in Move Processors.
In Third Annual Conference of ASCI, The Netherlands.
- Austin et al., 1995
-
Austin, T. M., Pnevmatikatos, D. N., and Sohi, G. S. (1995).
Zero-Cycle Loads: Microarchitecture Support for Reducing Load
Latency.
In Proceedings of the 28th Annual International Symposium on
Microarchitecture, pages 82-92, Michigan.
- Ball and Larus, 1993
-
Ball, T. and Larus, J. (1993).
Branch Prediction for Free.
In Proceedings of the ACM SIGPLAN'93 Conference on Programming
Language Design and Implementation, pages 300-313.
- Ball and Larus, 1996
-
Ball, T. and Larus, J. (1996).
Efficient Path Profiling.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 46-57, Paris, France.
- Beaty, 1991
-
Beaty, S. J. (1991).
Genetic Algorithms and Instruction Scheduling.
In Proceedings of the 24th Annual International Symposium on
Microarchitecture, pages 206-211, Albuquerque.
- Beckmann, 1994
-
Beckmann, C. J. (1994).
Hardware and Software for Functional and Fine Grain
Parallellism.
PhD thesis, University of Illinois at Urbana-Champaign, Centre of
Supercompter Research and Development.
- Bernstein and Rodeh, 1991
-
Bernstein, D. and Rodeh, M. (1991).
Global instruction scheduling for superscalar machines.
In Proceedings of the ACM SIGPLAN 1991 conference on Programming
Language Design and Implementation, pages 241-255.
- Briggs, 1992
-
Briggs, P. (1992).
Register Allocation via Graph Coloring.
PhD thesis, Rice University.
- Brownhil et al., 1997
-
Brownhil, C., Nicolau, A., Novack, S., and Polychronopoulos, C. (1997).
The PROMIS Compiler Prototype.
In Proceedings of the 1997 Conference on Parallel Architectures
and Compilation Techniques, San Francisco, USA.
- Burger et al., 1996a
-
Burger, D., Goodman, J. R., and Kägi, A. (1996a).
Memory Bandwidth Limitations of Future Microprocessors.
In Proceedings of the 23rd Annual International Symposium on
Computer Architecture, pages 78-89, Philadelphia, Pennsylvania. ACM SIGARCH
and IEEE Computer Society TCCA.
- Burger et al., 1996b
-
Burger, D., Kaxiras, S., and Goodman, J. R. (1996b).
DataScalar Architectures.
In Proceedings of the 23rd Annual International Symposium on
Computer Architecture, Philadelphia, Pennsylvania. ACM SIGARCH and IEEE
Computer Society TCCA.
- Burger et al., 1996c
-
Burger, D., Kaxiras, S., and Goodman, J. R. (1996c).
DataScalar Architectures and the SPSD Execution Model.
Technical Report TR 1317, University of Wisconsin-Madison Computer
Sciences Department.
- Calder et al., 1997
-
Calder, B., Feller, P., and Eustace, A. (1997).
Value Profiling.
In [MICRO30, 1997], pages 259-269.
- Calder and Grunwald, 1995
-
Calder, B. and Grunwald, D. (1995).
Next Cache Line and Set Prediction.
In Proceedings of the 22nd Annual International Symposium on
Computer Architecture, pages 287-296, Santa Margherita Ligure, Italy.
- Capitanio et al., 1992
-
Capitanio, A., Dutt, N., and Nicolau, A. (1992).
Partitioned Register Files for VLIWs: A Preliminary Analysis of
Tradeoffs.
In Proceedings of the 25th Annual International Symposium on
Microarchitecture, pages 292-300, Portland.
- Chang et al., 1995
-
Chang, P.-Y., Hao, E., and Patt, Y. (1995).
Alternative Implementations of Hybrid Branch Predictors.
In Proceedings of the 28th Annual International Symposium on
Microarchitecture, pages 252-257, Michigan.
- Chang et al., 1997
-
Chang, P.-Y., Hao, E., and Patt, Y. N. (1997).
Target Prediction for Indirect Jumps.
In [ISCA24, 1997].
- Chang et al., 1996
-
Chang, P.-Y., Hao, E., Patt, Y. N., and Chang, P. P. (1996).
Using Predicated Execution to Improve the Performance of a
Dynamically Scheduled Machine with Speculative Execution.
International Journal of Parallel Programming, 24(3).
- Chekuri et al., 1996
-
Chekuri, C. et al. (1996).
Profile-Driven Instruction Level Parallel Scheduling with Application
to Super Blocks.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 58-67, Paris, France.
- Chen et al., 1996
-
Chen, I.-C. K., Coffey, J. T., and Mudge, T. N. (1996).
Analysis of Branch Prediction Via Data Compression.
In ASPLOS VII, pages 128-137, Cambridge, Massachusetts.
- Chen et al., 1994
-
Chen, W., Mahlke, S., Warter, N., Anik, S., and Hwu, W. (1994).
Profile assisted instruction scheduling.
Int. J. of Parallel Programming, 22(2):151-181.
- Clark, 1987
-
Clark, D. W. (1987).
Pipelining and Performance in the VAX 8800 Processor.
In Proceedings of ASPLOS-II, pages 173-177, Palo Alto,
California.
- Cogswell, 1995
-
Cogswell, B. H. (1995).
Timing insensitive binary-to-binary translation.
PhD thesis, Carnegie Mellon University.
- Colwell et al., 1987
-
Colwell, R. P., Nix, R. P., O'Donnell, J. J., Papworth, D. B., and Rodman,
P. K. (1987).
A VLIW Architecture for a Trace Scheduling Compiler.
In Proceedings of the Second International Conference on
Architectural Support for Programming Languages and Operating Systems,
pages pages 180-192. ACM.
SIGPLAN Notices Vol. 22, No. 10.
- Conte et al., 1996
-
Conte, T. M. et al. (1996).
Instruction Fetch Mechanisms for VLIW Architectures with Compressed
Encodings.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 201-211, Paris, France.
- Conte and Sathaye, 1995
-
Conte, T. M. and Sathaye, S. W. (1995).
Dynamic Rescheduling: A Technique for Object Code Compatibility in
VLIW Architectures.
In Proceedings of the 28th Annual International Symposium on
Microarchitecture, pages 208-218, Ann Arbor, Michigan.
- Corporaal, 1997
-
Corporaal, H. (1997).
Microprocessor Architectures; from VLIW to TTA.
John Wiley.
ISBN 0-471-97157-X.
- Dehnert and Towle, 1993
-
Dehnert, J. C. and Towle, R. A. (1993).
Compiling for the Cydra 5.
The Journal of Supercomputing, 7(1/2):181-228.
- Diep et al., 1995
-
Diep, T. A., Nelson, C., and Shen, J. P. (1995).
Performance Evaluation of the PowerPC 620 Microarchitecture.
In The 22nd Annual International Symposium on Computer
Architecture, pages 163-174.
- Dubey, 1997
-
Dubey, P. K. (1997).
Architectural and Design Implications of Mediaprocessing.
Tutorial, Hot Chips IX, Stanford.
- Dubey et al., 1995
-
Dubey, P. K., O'Brien, K., O'Brien, K., and Barton, C. (1995).
Single-Program Speculative Multithreading (SPSM) Architecture:
Compiler-Assisted Fine-Grained Multithreading.
In International Conference on Parallel Architectures and
Compilation Techniques, pages 109-121.
- Dunn and Hsu, 1996
-
Dunn, D. A. and Hsu, W.-C. (1996).
Instruction Scheduling for the HP PA-8000.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 298-307, Paris, France.
- Dwyer and Torng, 1992
-
Dwyer, H. and Torng, H. (1992).
An Out-of-Order Superscalar Processor with Speculative Execution and
Fast, Precise Interrupts.
In Proceedings of the 25th Annual International Workshop on
Microprogramming, pages 272-281, Portland, Oregon.
- Ebcioğlu, 1987
-
Ebcioğlu, K. (1987).
A Compilation Technique for Software Pipelining of Loops with
Conditional Jumps.
In Proceedings of the 20th Annual Workshop on
Microprogramming.
- Ebcioğlu and Altman, 1997
-
Ebcioğlu, K. and Altman, E. (1997).
DAISY: Dynamic Compilation for 100% Architectural Compatibility.
In Proceedings of the 24th Annual International Symposium on
Computer Architecture, Denver, Colorado.
- Ebcioğlu et al., 1997
-
Ebcioğlu, K., Altman, E., and Hokenek, E. (1997).
A JAVA ILP Machine Based on Fast Dynamic Compilation.
In IEEE MASCOTS International Workshop on Security and
Efficiency Aspects of Java, Eilat, Israel.
- Ebcioğlu et al., 1994
-
Ebcioğlu, K., Groves, R., Kim, K., Silberman, G., and Ziv, I. (1994).
VLIW Compilation Techniques in a Superscalar Environment.
ACM SIGPLAN Notices, (PLDI'94), 29(6):36-48.
- Ebcioğlu and Nakatani, 1989
-
Ebcioğlu, K. and Nakatani, T. (1989).
A New Compilation Technique for Parallelizing Loops with
Unpredictable Branches on a VLIW Architecture.
In Proceedings of the Second Workshop on Programming Languages
and Compilers for Parallel Computing, University of Illinois at
Urbana-Champaign.
- Eickemeyer and Vassiliadis, 1993
-
Eickemeyer, R. J. and Vassiliadis, S. (1993).
A load-instruction unit for pipelined processors.
IBM Journal of Research and Development, 37(4):547-564.
- Ellis, 1986
-
Ellis, J. R. (1986).
Bulldog: A Compiler for VLIW Architectures.
ACM Doctoral Dissertation Awards. MIT Press, Cambridge,
Massachusetts.
- Emer and Gloy, 1997
-
Emer, J. and Gloy, N. (1997).
A language for describing predictors and its application to automatic
synthesis.
In [ISCA24, 1997].
- Ertl and Krall, 1994
-
Ertl, M. A. and Krall, A. (1994).
Delayed Exceptions - Speculative Execution of Trapping Instructions.
In Lecture Notes in Computer Science 786, Compiler
Construction, pages 158-171. Springer-Verlag.
- et al, 1981
-
et al, G. J. C. (1981).
Register Allocation via Coloring.
Computer Languages, 6:47-57.
- Farkas et al., 1997a
-
Farkas, K. I., Chow, P., Jouppi, N. P., and Vranesic, Z. (1997a).
The Multicluster Architecture: Reducing Cycle Time Through
Partitioning.
In [MICRO30, 1997], pages 149-159.
- Farkas et al., 1997b
-
Farkas, K. I., Jouppi, N. P., and Chow, P. (1997b).
Register File Design Considerations in Dynamically Scheduled
Processors.
Technical Report Research Report 95/10, Digital Western Research
Laboratory.
- Fisher et al., 1996
-
Fisher, J., Faraboschi, P., and Desoli, G. (1996).
Custom-Fit Processors: Letting Applications Define Architectures.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 324-335, Paris, France.
- Fisher, 1981
-
Fisher, J. A. (1981).
Trace Scheduling: A Technique for Global Microcode Compaction.
IEEE Transactions on Computers, C-30(7):478-490.
- Fisher and Freudenberger, 1992
-
Fisher, J. A. and Freudenberger, S. M. (1992).
Predicting conditional branch directions from previous runs of a
program.
In Proceedings of ASPLOS-V, pages 85-97, Boston.
- Foley, 1996
-
Foley, P. (1996).
The Mpact Media Processor Redefines the Multimedia PC.
In CompCon '96 conference proceedings, Santa Clara.
- Gabbay and Mendelson, 1997
-
Gabbay, F. and Mendelson, A. (1997).
Can Program Profiling Support Value Prediction?
In [MICRO30, 1997], pages 270-280.
- Gibbons and Muchnick, 1986
-
Gibbons, P. B. and Muchnick, S. S. (1986).
Efficient Instruction Scheduling for a Pipelined Architecture.
In Proceedings of the SIGPLAN Symposium on Compiler
Construction, pages 11-16.
- Gieseke, 1997
-
Gieseke, B. (1997).
A 600MHz Superscalar RISC Microprocessor with Out-of-Order Execution.
In IEEE International Solid-State Circuits Conference.
- Gillies et al., 1996
-
Gillies, D. M. et al. (1996).
Global Predicate Analysis and its Application to Register Allocation.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 114-125, Paris, France.
- Girkar and Polychronopoulos, 1994
-
Girkar, M. and Polychronopoulos, C. D. (1994).
The Hierarchical Task Graph as a Universal Intermediate
Representation.
International Journal of Parallel Programming, 22(5):519-551.
- Glossner and Vassiliadis, 1997
-
Glossner, C. J. and Vassiliadis, S. (1997).
The DELFT-JAVA Engine: An Introduction.
In Third int. Euro-Par Conference, pages 766-770, Pasau,
Germany.
- Gloy et al., 1996
-
Gloy, N., Young, C., Chen, J. B., and Smith, M. D. (1996).
An Analysis of Dynamic Branch Prediction Schemes on System Workloads.
In ISCA-23.
- González et al., 1997
-
González, A., Valero, M., Topham, N., and Parcerisa, J. M. (1997).
Eliminating Cache Conflict Misses Through XOR-Based Placement
Functions.
In Proc. of the ACM Int. Conf. on Supercomputing, pages 76-83,
Vienna, Austria.
- Granlund and Kenner, 1992
-
Granlund and Kenner (1992).
Eliminating Branches using a Superoptimizer and the GNU C Compiler.
In ACM SIGPLAN, pages 341-352.
- Grunwald et al., 1995
-
Grunwald, D. et al. (1995).
Corpus-Based Static Branch Prediction.
In Proceedings of the ACM SIGPLAN'95 Conference on Programming
Language Design and Implementation.
- Hank et al., 1995
-
Hank, R. E., Hwu, W. W., and Rau, B. R. (1995).
Region-based compilation: an introduction and motivation.
In Proceedings of the 28th Annual International Symposium on
Microarchitecture, pages 158-168, Michigan.
- Hennessy and Patterson, 1996
-
Hennessy, J. L. and Patterson, D. A. (1996).
Computer Architecture, a Quantitative Approach, Second
Edition.
Morgan Kaufmann publishers.
- Holler, 1996
-
Holler, A. M. (1996).
Optimization for a Superscalar Out-of-Order Machine.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 336-348, Paris, France.
- Hoogerbrugge, 1996
-
Hoogerbrugge, J. (1996).
Code generation for Transport Triggered Architectures.
PhD thesis, Delft Univ. of Technology.
ISBN 90-9009002-9.
- Hordijk and Corporaal, 1997a
-
Hordijk, J. and Corporaal, H. (1997a).
A Comparison of Different Multithreading Architectures.
Technical Report 1-68340-44(1997)11, Department of Electrical
Engineering, Delft University of Technology.
- Hordijk and Corporaal, 1997b
-
Hordijk, J. and Corporaal, H. (1997b).
The Potential of Exploiting Coarse-Grain Task Parallelism from
Sequential Programs.
In HPCN Europe '97, The International Conference and Exhibition
on High-Performance Computing and Networking, Vienna, Austria.
- Hsieh et al., 1996
-
Hsieh, C.-H. A., Gyllenhaal, J. C., and Hwu, W. W. (1996).
Java Bytecode to Native Code Translation: The Caffeine Prototype and
Preliminary Results.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 90-97, Paris, France.
- Hsu and Davidson, 1986
-
Hsu, P. Y. T. and Davidson, E. S. (1986).
Higly Concurrent Scalar Processing.
In Proceedings of ISCA-13, pages 386-395.
- Huff, 1993
-
Huff, R. A. (1993).
Lifetime-Sensitive Modulo Scheduling.
In Proceedings of the SIGPLAN '93 Conference on Programming
Language Design and Implementation, pages 258-267.
- Hunt, 1995
-
Hunt, D. (1995).
Advanced Performance Features of the 64-bit PA-8000.
In COMPCON 1995 Digest of Papers, pages 123-128.
- Hwu et al., 1993
-
Hwu, W. W. et al. (1993).
The Superblock: An Effective Technique for VLIW and Superscalar
Compilation.
The Journal of Supercomputing, 7(1/2):229-248.
- Hwu and Patt, 1987
-
Hwu, W. W. and Patt, Y. N. (1987).
Checkpoint Repair for High-Performance Out-of-Order Execution
Machines.
Transactions on Computers, C-36(12).
- ISCA24, 1997
-
ISCA24 (1997).
Proceedings of the 24th Annual International Symposium on
Computer Architecture, Denver, Colorado. ACM SIGARCH and IEEE Computer
Society TCCA.
- Jacobson et al., 1997
-
Jacobson, Q., Rotenberg, E., and Smith, J. E. (1997).
Path-Based Next Trace Prediction.
In [MICRO30, 1997], pages 14-23.
- Janssen and Corporaal, 1995
-
Janssen, J. and Corporaal, H. (1995).
Partitioned Register Files for TTAs.
In Proceedings of the 28th Annual International Symposium on
Microarchitecture, pages 303-312, Michigan.
- Janssen and Corporaal, 1997
-
Janssen, J. and Corporaal, H. (1997).
Registers On Demand, an integrated region scheduler and register
allocator.
In Submitted paper.
- J.L. Lo et al., 1997
-
J.L. Lo, S. E., Emer, J., Levy, H., Stamm, R., and Tullsen, D. (1997).
Converting Thread-Level Parallelism Into Instruction-Level
Parallelism via Simultaneous Multithreading.
ACM Transactions on Computer Systems.
- Johnson and Schlansker, 1996
-
Johnson, R. and Schlansker, M. (1996).
Analysis Techniques for Predicated Code.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 100-113, Paris, France.
- Johnson, 1991
-
Johnson, W. M. (1991).
Superscalar Microprocessor Design.
Prentice Hall.
- Jouppi and Wall, 1989
-
Jouppi, N. P. and Wall, D. W. (1989).
Available Instruction-Level Parallelism for Superscalar and
Superpipelined Machines.
In Proceedings of the 3rd International Conference on
Architectural Support for Programming Languages and Operating Systems, pages
272-282.
- Karkowski and Corporaal, 1997
-
Karkowski, I. and Corporaal, H. (1997).
Overcoming the Limitations of the Traditional Loop Parallelization.
In HPCN Europe '97, The International Conference and Exhibition
on High-Performance Computing and Networking, Vienna, Austria.
- Kathail et al., 1994
-
Kathail, V., Schlansker, M., and Rau, B. (1994).
HPL PlayDoh Architecture Specification: Version 1.0.
Technical Report HPL-93-80, Hewlett Packard Computer Systems
Laboratory, Palo Alto, CA.
- Kunkel and Smith, 1986
-
Kunkel, S. R. and Smith, J. E. (1986).
Optimal Pipelining in Supercomputers.
In ISCA-13, pages 404-414, Tokyo, Japan.
- Lam, 1988
-
Lam, M. (1988).
Software Pipelining: An Effective Scheduling Technique for VLIW
Machines.
In Proceedings of the SIGPLAN '88 Conference on Programming
Language Design and Implementation, pages 318-328.
- Lam and Wilson, 1992
-
Lam, M. S. and Wilson, R. P. (1992).
Limits of control flow on parallelism.
In ISCA-19, pages 46-57, Australia.
- Larus, 1990
-
Larus, J. R. (1990).
Parallelism in Numeric and Symbolic Programs.
In Proceedings of International Workshop on Compilers for
Parallel Computers, pages 157-170.
- Lavery and Hwu, 1996
-
Lavery, D. M. and Hwu, W. W. (1996).
Modulo Scheduling of Loops in Control-Intensive Non-Numeric Programs.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 126-137, Paris, France.
- Lee et al., 1997a
-
Lee, C.-C., Chen, I.-C. K., and Mudge, T. N. (1997a).
The Bi-Mode Branch Predictor.
In [MICRO30, 1997], pages 4-13.
- Lee et al., 1995
-
Lee, D., Baer, J.-L., Calder, B., and Grunwald, D. (1995).
Instruction Cache Fetch Policies for Speculative Execution.
In ISCA-22 proceedings, pages 357-367, Santa Margherita
Ligure, Italy.
- Lee and Smith, 1984
-
Lee, J. and Smith, A. (1984).
Branch Prediction Strategies and Branch Target Buffer Design.
In IEEE Computer, pages 6-22.
- Lee et al., 1997b
-
Lee, R. B. et al. (1997b).
MAX-2 Multimedia Extensions for PA-RISC 2.0 Processors.
Presentation, Hot Chips IX, Stanford.
- Liao, 1996
-
Liao, S. Y.-H. (1996).
Code Generation and Optimization for Embedded Digital Signal
Processors.
PhD thesis, MIT.
- Lilja and Bird, 1994
-
Lilja, D. J. and Bird, P. L. (1994).
The interaction of compilation technology and computer
architecture.
Kluwer Academic Publishers.
- Lipasti and Shen, 1996
-
Lipasti, M. H. and Shen, J. P. (1996).
Exceeding the Dataflow Limit via Value Prediction.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 226-237, Paris, France.
- Lipasti et al., 1996
-
Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. (1996).
Value Locality and Load Value Prediction.
In Proceedings of the Seventh ACM Conference on Architectural
Support for Programming Languages and Operating Systems, Cambridge,
Massachusetts.
- Lowney et al., 1993
-
Lowney, P. G. et al. (1993).
The Multiflow Trace Scheduling Compiler.
The Journal of Supercomputing, 7(1/2):51-142.
- Luk and Mowry, 1996
-
Luk, C.-K. and Mowry, T. C. (1996).
Compiler-Based Prefetching for Recursive Data Structures.
In Proceedings of the Seventh International Conference on
Architectural Support for Programming Languages and Operating Systems, pages
222-233, Cambridge, Massachusetts.
- Mahadevan and Ramakrishnan, 1994
-
Mahadevan, U. and Ramakrishnan, S. (1994).
Instruction Scheduling over Regions: A Framework for Scheduling
Across Basic Blocks.
In Proceedings of the International Conference on Compiler
Construction, pages 419-434, Edinburgh, Scotland.
- Mahlke et al., 1992a
-
Mahlke, S. A. et al. (1992a).
Sentinel scheduling for VLIW and Superscalar Processors.
In Proceedings of ASPLOS-V, pages 238-247, Boston.
- Mahlke et al., 1995
-
Mahlke, S. A. et al. (1995).
A Comparison of Full and Partial Predicated Execution Support for ILP
Processors.
In The 22nd Annual International Symposium on Computer
Architecture, pages 138-149.
- Mahlke et al., 1992b
-
Mahlke, S. A., Lin, D. C., Chen, W. Y., Hank, R. E., and Bringmann, R. A.
(1992b).
Effective compiler support for predicated execution using the
hyperblock.
In Proceedings of the 25th Annual International Symposium on
Microarchitecture.
- Mahlke and Natarajan, 1996
-
Mahlke, S. A. and Natarajan, B. (1996).
Compiler Synthesized Dynamic Branch Prediction.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 153-164.
- Martin et al., 1997
-
Martin, M. M., Roth, A., and Fischer, C. N. (1997).
Exploiting Dead Value Information.
In [MICRO30, 1997], pages 125-135.
- Maydan et al., 1995
-
Maydan, D. E., Hennessy, J. L., and Lam, M. S. (1995).
Effectiveness of Data Dependence Analysis.
International Journal of Parallel Programming, 23(1).
- McFarling, 1993
-
McFarling, S. (1993).
Combining Branch Predictors.
WRL Technical Note TN-36.
- McFarling and Hennessy, 1986
-
McFarling, S. and Hennessy, J. (1986).
Reducing the Cost of Branches.
In Proc. 13th Ann. Int'l Symp. on Computer Architecture, pages
396-403.
- MICRO30, 1997
-
MICRO30 (1997).
Proceedings of the 30th Annual International Symposium on
Microarchitecture, Research Triangle Park, North Carolina. IEEE Computer
Society TC-MICRO and ACM SIGMICRO.
- Moon and Ebcioğlu, 1992
-
Moon, S. and Ebcioğlu, K. (1992).
An efficient resource-constrained global scheduling technique for
superscalar and VLIW processors.
In Proceedings of the 25th Annual International Symposium on
Microarchitecture, Portland.
- Moshovos et al., 1997
-
Moshovos, A., Breach, S. E., Vijaykumar, T., and Sohi, G. S. (1997).
Dynamic Speculation and Synchronization of Data Dependences.
In [ISCA24, 1997].
- Moshovos and Sohi, 1997
-
Moshovos, A. and Sohi, G. S. (1997).
Streamlining Inter-Operation Memory Communication via Data Dependence
Prediction.
In [MICRO30, 1997], pages 235-245.
- Moudgill, 1994
-
Moudgill, M. (1994).
Implementing and exploiting static speculation on multiple
instruction issue processors.
PhD thesis, Cornell University.
- Moudgill and Vassiliadis, 1996
-
Moudgill, M. and Vassiliadis, S. (1996).
On Precise Interrupts.
IEEE Micro, pages 58-67.
- Muchnick, 1997
-
Muchnick, S. (1997).
Advanced Compiler Design and Implementation.
Morgan Kaufmann Publishers.
ISBN 1-55860-320-4.
- Nair, 1995
-
Nair, R. (1995).
Dynamic Path-Based Branch Correlation.
In Proceedings of the 28th Annual International Symposium on
Microarchitecture, pages 15-23, Michigan.
- Nicolau, 1985
-
Nicolau, A. (1985).
Percolation Scheduling: A Parallel Compilation Technique.
Technical Report TR 85-678, Cornell University, Department of
Computer Science, Cornell University, Ithaca, NY 14853, USA.
- Nicolau, 1989
-
Nicolau, A. (1989).
Run-TIme disambiguation: Coping with statically unpredictable
dependencies.
IEEE Transactions on Computers, 38(5).
- Nicolau and Fisher, 1984
-
Nicolau, A. and Fisher, J. (1984).
Measuring the Parallelism available for very long instruction word
architectures.
computers, C33(11):968-976.
- Normyle and Csoppenszky, 1997
-
Normyle, K. and Csoppenszky, M. (1997).
UltraSPARC IIi - A highly integrated 300 Mhz 64-bit SPARC V9
CPU.
Presentation, Hot Chips IX, Stanford.
- Novack et al., 1995a
-
Novack, S., Hummel, J., and Nicolau, A. (1995a).
A simple mechanism for improving the accuracy and efficiency of
instruction level disambiguation.
Lecture Notes in Computer Science 1033, pages 289-303.
- Novack et al., 1995b
-
Novack, S., Nicolau, A., and Dutt, N. (1995b).
A Unified Code Generation Approach Using Mutation Scheduling.
In Code Generation for Embedded Processors, page Chapter 12.
Kluwer Academic Publishers.
- Palacharla et al., 1997
-
Palacharla, S., Jouppi, N. P., and Smith, J. E. (1997).
Complexity-Effective Superscalar Processors.
In [ISCA24, 1997].
- Pan et al., 1992
-
Pan, S.-T. et al. (1992).
Improving the accuracy of dynamic branch prediction using branch
correlation.
In Proceedings of ASPLOS-V, pages 76-84, Boston.
- Park and Schlansker, 1991
-
Park, J. C. H. and Schlansker, M. (1991).
On predicated execution.
Technical Report HPL-91-58, HP Laboratories, Palo Alto.
- Patterson et al., 1997
-
Patterson, D. et al. (1997).
A case for intelligent RAM.
IEEE micro, pages 34-44.
- Patterson, 1995
-
Patterson, J. (1995).
Accurate Static Branch Prediction by Value Range Propagation.
In Proceedings of the ACM SIGPLAN'95 Conference on Programming
Language Design and Implementation.
- Paulin and Knight, 1989
-
Paulin, P. G. and Knight, J. P. (1989).
Force-Directed Scheduling for the Behavioral Synthesis of ASIC's.
IEEE trans. on computer-aided design, 8(6):661-679.
- Pearl, 1984
-
Pearl, J. (1984).
Heuristics: intelligent search strategies for computer problem
solving.
Addison-Wesley.
- PicoJava, 1996
-
PicoJava (1996).
Picojava I Microprocessor Core Architecture.
SUN white paper on Web-site:
http://www.sun.com/sparc/whitepapers/wpr-0014-01.
- Pinter, 1993
-
Pinter, S. S. (1993).
Register Allocation with Instruction Scheduling: a New Approach.
In SIGPLAN '93 Conference on Programming Language Design and
Implementation, pages 248-257.
- Ramakrishnan, 1992
-
Ramakrishnan, S. (1992).
Software Pipelining in PA-RISC Compilers.
Hewlett-Packard Journal, pages 39-45.
- Rathnam and Slavenburg, 1996
-
Rathnam, S. and Slavenburg, G. (1996).
An Architectural Overview of the Programmable Multimedia Processor,
TM-1.
In CompCon '96 conference proceedings, Santa Clara.
- Rau, 1993
-
Rau, B. R. (1993).
Dynamically Scheduled VLIW Processors.
In Proceedings of the 26th Annual International Symposium on
Microarchitecture, pages 80-92, Austin, Texas.
- Rau, 1994
-
Rau, B. R. (1994).
Iterative Modulo Scheduling: An Algorithm For Software Pipelining
Loops.
In Proceedings of the 27th Annual International Workshop on
Microprogramming, San Jose, California.
- Rau, 1996
-
Rau, B. R. (1996).
Iterative Modulo Scheduling.
Int. J. of Parallel Programming, 24(1):3-64.
- Rau and Fisher, 1993
-
Rau, B. R. and Fisher, J. A. (1993).
Instruction-Level Parallel Processing: History, Overview and
Perspective.
The Journal of Supercomputing, 7(1/2):9-50.
- Rau and Glaeser, 1981
-
Rau, B. R. and Glaeser, C. D. (1981).
Some Scheduling Techniques and an Easily Schedulable Horizontal
Architecture for High Performance Scientific Computing.
In Proceedings of the 14th Annual Workshop on Microprogramming,
pages 183-198.
- Rivers et al., 1997
-
Rivers, J. A., Tyson, G. S., Davidson, E. S., and Austin, T. M. (1997).
On High-Bandwidth Data Cache Design for Multi-Issue Processors.
In [MICRO30, 1997], pages 46-56.
- Rompaey et al., 1992
-
Rompaey, K. V., Bolsens, I., and Man, H. D. (1992).
Just in time scheduling.
In ICCD-92, pages 295-300, Boston.
- Rotenberg et al., 1996
-
Rotenberg, E., Bennett, S., and Smith, J. E. (1996).
Trace Cache: a Low Latency Approach to High Bandwidth Instruction
Fetching.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 24-34, Paris, France.
- Rotenberg et al., 1997
-
Rotenberg, E., Jacobson, Q., Sazeides, Y., and Smith, J. (1997).
Trace Processors.
In [MICRO30, 1997], pages 138-148.
- Sánchez and González, 1997
-
Sánchez, F. J. and González, A. (1997).
Cache Sensitive Modulo Scheduling.
In [MICRO30, 1997], pages 338-348.
- Saulsbury et al., 1996
-
Saulsbury, A., Pong, F., and Nowatzyk, A. (1996).
Missing the Memory Wall: The Case for Processor/Memory Integration.
In ISCA-23.
- Sazeides and Smith, 1997
-
Sazeides, Y. and Smith, J. E. (1997).
The Predictability of Data Values.
In [MICRO30, 1997], pages 248-258.
- Sazeides et al., 1996
-
Sazeides, Y., Vassiliadis, S., and Smith, J. E. (1996).
The Performance Potential of Data Dependence Speculation &
Collapsing.
In Proceedings of the 29th Annual International Symposium on
Microarchitecture, pages 238-247, Paris, France.
- Schlansker and Kathail, 1995
-
Schlansker, M. and Kathail, V. (1995).
Critical Path Reduction for Scalar Programs.
In Proceedings of the 28th Annual International Symposium on
Microarchitecture, pages 57-69, Michigan.
- Schuette and Shen, 1993
-
Schuette, M. and Shen, J. (1993).
Instruction-Level Experimental Evaluation of the Multiflow TRACE
14/300 VLIW Computer.
The Journal of Supercomputing, 7(1/2):249.
- Seznec et al., 1996
-
Seznec, A., Jourdan, S., Sainrat, P., and Michaud, P. (1996).
Multiple-Block Ahead Branch Predictors.
In ASPLOS VII, Cambridge, Massachusetts.
- Silberman and Ebcioğlu, 1993
-
Silberman, G. M. and Ebcioğlu, K. (1993).
An architecture framework for supporting heterogeneous
instruction-set architectures.
IEEE Computer, 26(6):39-56.
- Simone et al., 1995
-
Simone, M. et al. (1995).
Implementation Trade-offs in Using a Restricted Data Flow
Architecture in a High Performance RISC Microprocessor.
In The 22nd Annual International Symposium on Computer
Architecture, pages 151-162.
- Smith and Sohi, 1995
-
Smith, J. E. and Sohi, G. S. (1995).
The Microarchitecture of Superscalar Processors.
Technical report, University of Wisconsin-Madison.
- Smith et al., 1992
-
Smith, M. D., Horowitz, M., and Lam, M. S. (1992).
Efficient superscalar performance through boosting.
In Proceedings of ASPLOS-V, pages 248-261, Boston.
- Smotherman et al., 1991
-
Smotherman, M. et al. (1991).
Efficient DAG Construction and Heuristic Calculation for Instruction
Scheduling.
In Proceedings of the 24th Annual International Symposium on
Microarchitecture, pages 93-102, Albuquerque.
- Sánchez et al., 1997
-
Sánchez, F. J., González, A., and Valero, M. (1997).
Static Locality Analysis for Cache Management.
In Proc. of Int. Conf. on Parallel Architectures and Compilation
Techniques, San Francisco, USA.
- Sodani and Sohi, 1997
-
Sodani, A. and Sohi, G. S. (1997).
Dynamic Instruction Reuse.
In [ISCA24, 1997].
- Sohi et al., 1995
-
Sohi, G. S., Breach, S. E., and Vijaykumar, T. (1995).
Multiscalar Processors.
In ISCA'22 proceedings, pages 414-425, Santa Margherita
Ligure, Italy.
- Stark et al., 1997
-
Stark, J., Racunas, P., and Patt, Y. N. (1997).
Reducing the Performance Impact of Instruction Cache Misses by
Writing Instructions into the Reservation Stations Out-of-Order.
In [MICRO30, 1997], pages 34-43.
- Stok, 1994
-
Stok, L. (1994).
Data path synthesis.
INTEGRATION, the VLSI journal, 18(1):1-71.
- Su and Wang, 1991
-
Su, B. and Wang, J. (1991).
Loop-Carried Dependence and the General URPR Software Pipelining
Approach.
In Proceedings of HICSS-24, Vol. 2, pages 366-372.
- Sweany and Beaty, 1990
-
Sweany, P. and Beaty, S. (1990).
Post-Compaction Register Assignment in a Retargetable Compiler.
In Proceedings of the 23rd Annual Workshop on Microprogramming
and Microarchitectures, pages 107-116, Orlando.
- Truong, 1997
-
Truong, L. (1997).
The VelociTI Architecture of the TMS320C6x.
Presentation, Hot Chips IX, Stanford.
- Tsai and Yew, 1996
-
Tsai, J.-Y. and Yew, P.-C. (1996).
The Superthreaded Architecture: Thread Pipelining with Run-time Data
Dependence Checking and Control Speculation.
Technical Report TR 96-037, Univ. of Minnesota, Department of
Computer Science.
- Tullsen et al., 1995
-
Tullsen, D. M., Eggers, S. J., and Levy, H. M. (1995).
Simultaneous Multithreading: maximizing On-Chip Parallelism.
In Proceedings of the 22nd Annual International Symposium on
Computer Architecture, pages 392-403, Santa Margherita Ligure, Italy.
- Tullsen et al., 1996
-
Tullsen, D. M. et al. (1996).
Exploiting Choice: Instruction Fetch and Issue on an Implementable
Simultaneous Multithreading Processor.
In Proceedings of the 23nd Annual International Symposium on
Computer Architecture, Philadelphia, PA.
- Uht and Sindagi, 1995
-
Uht, A. K. and Sindagi, V. (1995).
Disjoint Eager Execution: An Optimal Form of Speculative Execution.
In Proc. of the 28th Annual International Symposium on
Microarchitecture.
- Vajapeyam and Mitra, 1997
-
Vajapeyam, S. and Mitra, T. (1997).
Improving Superscalar Instruction Dispatch and Issue by Exploiting
Dynamic Code Sequences.
In ISCA-97.
- Vassiliadis et al., 1993
-
Vassiliadis, S., Phillips, J., and Blaner, B. (1993).
Interlock Collapsing ALUs.
IEEE Transactions on Computers, 42(7):825-839.
- Wall, 1991
-
Wall, D. W. (1991).
Limits of Instruction-Level Parallelism.
In Proceedings of ASPLOS-IV, pages 176-188, Santa Clara,
California. ACM.
- Wang and Eisenbeis, 1993
-
Wang, J. and Eisenbeis, C. (1993).
Decomposed Software Pipelining: A New Approach to Exploit Instruction
Level Parallelism for Loop Programs.
In IFIP WG 10.3 Working Conference on Architectures and
Compilation Techniques for Fine and Medium Grain Parallelism, Orlando,
Florida.
- Wang and Franklin, 1997
-
Wang, K. and Franklin, M. (1997).
Highly Accurate Data Value Prediction using Hybrid Predictors.
In [MICRO30, 1997], pages 281-290.
- Warter et al., 1993
-
Warter, N. J. et al. (1993).
Reverse If-Conversion.
In Proceedings of the ACM SIGPLAN '93 Conference on Program
Language Design and Implementation.
- Weiss and Smith, 1994
-
Weiss, S. and Smith, J. E. (1994).
POWER and PowerPC.
Morgan Kaufmann Publishers.
- Wilhelm and Maurer, 1995
-
Wilhelm, R. and Maurer, D. (1995).
Compiler Design.
Addison-Wesley.
- Wilson et al., 1995
-
Wilson, R., Franch, R., Wilson, C., Amarasinghe, S., Anderson, J., Tjiang, S.,
Liao, S.-W., Tseng, C.-W., Hall, M., Lam, M., and Hennessy, J. (1995).
An Overview of the SUIF Compiler System.
Web-site: http://suif.stanford.edu/suif/suif.html.
- Wolf, 1992
-
Wolf, M. E. (1992).
Improving Locality and Parallelism in Nested Loops.
PhD thesis, Stanford University, Computer Systems Laboratory,
Stanford, CA 94305.
also available as technical report CSL-TR-92-538.
- Wolfe and Chanin, 1992
-
Wolfe, A. and Chanin, A. (1992).
Executing Compressed Programs on An Embedded RISC Architecture.
In Proceedings of the 25th Annual International Symposium on
Microarchitecture, pages 81-91, Portland, Oregon.
- Wu and Larus, 1994
-
Wu, Y. and Larus, J. (1994).
Static Branch Frequency and Program Profile Analysis.
In Proceedings of the 27th Annual International Symposium on
Microarchitecture, pages 1-11.
- Yeager, 1996
-
Yeager, K. C. (1996).
MIPS R10000.
IEEE micro.
- Yeh and Patt, 1991
-
Yeh, T. and Patt, Y. (1991).
Two-level adaptive training branch prediction.
In Proceedings of the 24th Annual International Symposium on
Microarchitecture, pages 51-61.
- Yeh and Patt, 1992
-
Yeh, T. and Patt, Y. (1992).
Alternative Implementations of Two-Level Adaptive Branch Prediction.
In Proceedings of the 19th Annual International Symposium on
Computer Architecture, pages 124-134.
- Yeh and Patt, 1993
-
Yeh, T. and Patt, Y. (1993).
A Comparison of Dynamic Branch Predictors that use Two Levels of
Branch History.
In Proceedings of the 20th Annual International Symposium on
Computer Architecture, pages 257-266.
- Young et al., 1995
-
Young, C., Gloy, N., and Smith, M. D. (1995).
A Comparative Analysis of Schemes for Correlated Branch Prediction.
In The 22nd Annual International Symposium on Computer
Architecture, pages 276-286.
- Zima and Chapman, 1991
-
Zima, H. and Chapman, B. (1991).
Supercompilers for Parallel and Vector Computers.
Addison-Wesley.
Henk Corporaal
Tue Mar 10 11:20:49 CET 1998