| Table of Contents | List of Figures | **List of Examples** | List of Tables |

- Example 1-1. Parallel Code Using Directives for Simple Scheduling
- Example 1-2. Parallel Code Using Directives for Interleaved Scheduling
- Example 4-1. Experimenting with perfex
- Example 4-2. Output of perfex -a
- Example 4-3. Output of perfex -a -y
- Example 4-4. Performing an ssrun Experiment
- Example 4-5. Example Run of ssruno
- Example 4-6. Default prof Report from ssrun Experiment
- Example 4-7. Profile at the Source Line Level Using prof -heavy
- Example 4-8. Ideal Time Profile Run
- Example 4-9. Default Report of Ideal Time Profile
- Example 4-10. Ideal Time Report Truncated with -quit
- Example 4-11. Ideal Time Report by Lines
- Example 4-12. Ideal Time Profile Using -lines and -only Options
- Example 4-13. Ideal Time Architecture Information Report
- Example 4-14. Extract from a Butterfly Report
- Example 4-15. Usertime Call Hierarchy
- Example 4-16. Application of dprof
- Example 4-17. Example of Default dlook Output
- Example 5-1. Simple Summation Loop
- Example 5-2. Unrolled Summation Loop
- Example 5-3. Basic DAXPY Loop
- Example 5-4. Unrolled DAXPY Loop
- Example 5-5. Compiler-Generated DAXPY Schedule
- Example 5-6. Basic DAXPY Loop Code
- Example 5-7. Sample Software Pipeline Report Card
- Example 5-8. C Implementation of DAXPY Loop
- Example 5-9. SWP Report Card for C Loop with Default Alias Model
- Example 5-10. SWP Report Card for C Loop with Alias=Restrict
- Example 5-11. C Loop Nest on Multidimensional Array
- Example 5-12. SWP Report Card for Stencil Loop with Alias=Restrict
- Example 5-13. SWP Report Card for Stencil Loop with Alias=Disjoint
- Example 5-14. Indirect DAXPY Loop
- Example 5-15. SWP Report Card on Indirect DAXPY
- Example 5-16. Indirect DAXPY in Fortran with ivdep
- Example 5-17. Indirect DAXPY in C with ivdep
- Example 5-18. SWP Report Card for Indirect DAXPY with ivdep
- Example 5-19. Loop with Two Types of Dependency
- Example 5-20. C Loop with Obvious Loop-Carried Dependence
- Example 5-21. C Loop with Lexically-Forward Dependency
- Example 5-22. C Loop Test Using Dereferenced Pointer
- Example 5-23. C Loop Test Using Local Copy of Dereferenced Pointer
- Example 5-24. C Loop with Disguised Invariants
- Example 5-25. SWP Report Card for Loop with Disguised Invariance
- Example 5-26. C Loop with Invariants Exposed
- Example 5-27. SWP Report Card for Modified Loop
- Example 5-28. Conventional Code to Avoid an ExceptionSpeculative Equivalent Permitting an Exception
- Example 5-29. Code Suitable for Inlining
- Example 5-30. Subroutine Candidates for Inlining
- Example 5-31. Inlined Code from w2f File
- Example 6-1. Simple Loop Nest with Poor Cache Use
- Example 6-2. Reversing Loop Nest to Achieve Stride-One Access
- Example 6-3. Loop Using Three Vectors
- Example 6-4. Three Vectors Combined in an Array
- Example 6-5. Fortran Code Likely to Cause Thrashing
- Example 6-6. Perfex Data for adi2.f
- Example 6-7. Perfex Data for adi5.f
- Example 6-8. Perfex Data for adi53.f
- Example 6-9. Sequence of DAXPY and Dot-Product on a Single Vector
- Example 6-10. DAXPY and Dot-Product Loops Fused
- Example 6-11. Matrix Multiplication Loop
- Example 7-1. Matrix Multiplication Subroutine
- Example 7-2. SWP Report Card for Matrix Multiplication
- Example 7-3. Matrix Multiplication Unrolled on Outer Loop
- Example 7-4. Matrix Multiplication Unrolled on Middle Loop
- Example 7-5. Matrix Multiplication Unrolled on Outer and Middle Loops
- Example 7-6. Simple Loop Nest with Poor Cache Use
- Example 7-7. Simple Loop Nest Interchanged for Stride-1 Access
- Example 7-8. Loop Nest with Data Recursion
- Example 7-9. Recursive Loop Nest Interchanged and Unrolled
- Example 7-10. Matrix Multiplication in C
- Example 7-11. Cache-Blocked Matrix Multiplication
- Example 7-12. Fortran Nest with Explicit Cache Block Sizes for Middle and Inner Loops
- Example 7-13. Fortran Loop with Explicit Cache Block Sizes and Interchange
- Example 7-14. Transformed Fortran Loop
- Example 7-15. Adjacent Loops that Cannot be Fused
- Example 7-16. Adjacent Loops Fused After Peeling
- Example 7-17. Sketch of a Loop with a Long Body
- Example 7-18. Sketch of a Loop After Fission
- Example 7-19. Loop Nest that Cannot Be Interchanged
- Example 7-20. Loop Nest After Fission and Interchange
- Example 7-21. Simple Reduction Loop Needing Prefetch
- Example 7-22. Simple Reduction Loop with Prefetch
- Example 7-23. Reduction with Conditional Prefetch
- Example 7-24. Reduction with Prefetch Unrolled Once
- Example 7-25. Reduction Loop Unrolled with Two-Ahead Prefetch
- Example 7-26. Reduction Loop Unrolled Four Times
- Example 7-27. Reduction Loop Reordered for Pseudo-Prefetching
- Example 7-28. Fortran Use of Manual Prefetch
- Example 7-29. Typical Fortran Declaration of Local Arrays
- Example 7-30. Common, Improper Fortran Practice
- Example 7-31. Fortran Loop to which Gather-Scatter Is Applicable
- Example 7-32. Fortran Loop with Gather-Scatter Applied
- Example 7-33. Fortran Loop That Processes a Vector
- Example 7-34. Fortran Loop Transformed to Vector Intrinsic Call
- Example 8-1. Typical C Loop
- Example 8-2. Amdahl's law: Speedup(n) Given p
- Example 8-3. Amdahl's law: p Given Speedup(2)
- Example 8-4. Amdahl's Law: p Given Speedup(n) and Speedup(m)
- Example 8-5. Fortran Loop with False Sharing
- Example 8-6. Fortran Loop with False Sharing Removed
- Example 8-7. Easily Parallelized Fortran Vector Routine
- Example 8-8. Fortran Vector Operation, Parallelized
- Example 8-9. Fortran Vector Operation with Distribution Directives
- Example 8-10. Parallel Loop with Affinity in Data
- Example 8-11. Parallel Loop with Affinity in Threads
- Example 8-12. Loop Parallelized with the NEST Clause
- Example 8-13. Loop Parallelized with NEST Clause with Data Affinity
- Example 8-14. Loop Parallelized with NEST, AFFINITY, and ONTO
- Example 8-15. Fortran Code for Explicit Page Placement
- Example 8-16. Declarations Using the Distribute_Reshape Directive
- Example 8-17. Valid and Invalid Use of Reshaped Array
- Example 8-18. Corrected Use of Reshaped Array
- Example 8-19. Gathering Reshaped Data with Copying
- Example 8-20. Gathering Reshaped Data with Cache-Friendly Copying
- Example 8-21. Reshaped Array as Actual Parameter—Valid
- Example 8-22. Reshaped Array as Actual Parameter—Invalid
- Example 8-23. Differently Reshaped Arrays as Actual Parameters
- Example 8-24. Typical Output of _DSM_VERBOSE
- Example 8-25. Test Placement Display from First-Touch Allocation
- Example 8-26. Test Placement Display from Round-Robin Placement
- Example 8-27. Scalable Placement File
- Example 8-28. Scalable Placement File for Two Threads Per Memory
- Example 8-29. Various Ways of Distributing Threads to Memories
- Example 8-30. Calling dplace Dynamically from Fortran
- Example 8-31. Using a Script to Capture Redirected Output from an MPI Job
- Example A-1. Naive Function to Find Nearest Point
- Example A-2. Nearest-Point Function with Short-Circuit Test
- Example C-1. Program adi2.f
- Example C-2. Program adi5.f
- Example C-3. Program adi53.f
- Example C-4. Basic Makefile
- Example C-5. Shell Script swplist
- Example C-6. SpeedShop Experiment Script ssruno
- Example C-7. Awk Script to Analyze Output of perfex -a
- Example C-8. Awk Script to Extrapolate Amdahl's Law from Measured Times
- Example C-9. Routine va2pa() Returns the Physical Page of a Virtual Address
- Example C-10. Routine cpuclock() Gets the Clock Speed from the Hardware Inventory