List of Examples

| Table of Contents | List of Figures | List of Examples | List of Tables |

Example 1-1. Parallel Code Using Directives for Simple Scheduling
Example 1-2. Parallel Code Using Directives for Interleaved Scheduling
Example 4-1. Experimenting with perfex
Example 4-2. Output of perfex -a
Example 4-3. Output of perfex -a -y
Example 4-4. Performing an ssrun Experiment
Example 4-5. Example Run of ssruno
Example 4-6. Default prof Report from ssrun Experiment
Example 4-7. Profile at the Source Line Level Using prof -heavy
Example 4-8. Ideal Time Profile Run
Example 4-9. Default Report of Ideal Time Profile
Example 4-10. Ideal Time Report Truncated with -quit
Example 4-11. Ideal Time Report by Lines
Example 4-12. Ideal Time Profile Using -lines and -only Options
Example 4-13. Ideal Time Architecture Information Report
Example 4-14. Extract from a Butterfly Report
Example 4-15. Usertime Call Hierarchy
Example 4-16. Application of dprof
Example 4-17. Example of Default dlook Output
Example 5-1. Simple Summation Loop
Example 5-2. Unrolled Summation Loop
Example 5-3. Basic DAXPY Loop
Example 5-4. Unrolled DAXPY Loop
Example 5-5. Compiler-Generated DAXPY Schedule
Example 5-6. Basic DAXPY Loop Code
Example 5-7. Sample Software Pipeline Report Card
Example 5-8. C Implementation of DAXPY Loop
Example 5-9. SWP Report Card for C Loop with Default Alias Model
Example 5-10. SWP Report Card for C Loop with Alias=Restrict
Example 5-11. C Loop Nest on Multidimensional Array
Example 5-12. SWP Report Card for Stencil Loop with Alias=Restrict
Example 5-13. SWP Report Card for Stencil Loop with Alias=Disjoint
Example 5-14. Indirect DAXPY Loop
Example 5-15. SWP Report Card on Indirect DAXPY
Example 5-16. Indirect DAXPY in Fortran with ivdep
Example 5-17. Indirect DAXPY in C with ivdep
Example 5-18. SWP Report Card for Indirect DAXPY with ivdep
Example 5-19. Loop with Two Types of Dependency
Example 5-20. C Loop with Obvious Loop-Carried Dependence
Example 5-21. C Loop with Lexically-Forward Dependency
Example 5-22. C Loop Test Using Dereferenced Pointer
Example 5-23. C Loop Test Using Local Copy of Dereferenced Pointer
Example 5-24. C Loop with Disguised Invariants
Example 5-25. SWP Report Card for Loop with Disguised Invariance
Example 5-26. C Loop with Invariants Exposed
Example 5-27. SWP Report Card for Modified Loop
Example 5-28. Conventional Code to Avoid an ExceptionSpeculative Equivalent Permitting an Exception
Example 5-29. Code Suitable for Inlining
Example 5-30. Subroutine Candidates for Inlining
Example 5-31. Inlined Code from w2f File
Example 6-1. Simple Loop Nest with Poor Cache Use
Example 6-2. Reversing Loop Nest to Achieve Stride-One Access
Example 6-3. Loop Using Three Vectors
Example 6-4. Three Vectors Combined in an Array
Example 6-5. Fortran Code Likely to Cause Thrashing
Example 6-6. Perfex Data for adi2.f
Example 6-7. Perfex Data for adi5.f
Example 6-8. Perfex Data for adi53.f
Example 6-9. Sequence of DAXPY and Dot-Product on a Single Vector
Example 6-10. DAXPY and Dot-Product Loops Fused
Example 6-11. Matrix Multiplication Loop
Example 7-1. Matrix Multiplication Subroutine
Example 7-2. SWP Report Card for Matrix Multiplication
Example 7-3. Matrix Multiplication Unrolled on Outer Loop
Example 7-4. Matrix Multiplication Unrolled on Middle Loop
Example 7-5. Matrix Multiplication Unrolled on Outer and Middle Loops
Example 7-6. Simple Loop Nest with Poor Cache Use
Example 7-7. Simple Loop Nest Interchanged for Stride-1 Access
Example 7-8. Loop Nest with Data Recursion
Example 7-9. Recursive Loop Nest Interchanged and Unrolled
Example 7-10. Matrix Multiplication in C
Example 7-11. Cache-Blocked Matrix Multiplication
Example 7-12. Fortran Nest with Explicit Cache Block Sizes for Middle and Inner Loops
Example 7-13. Fortran Loop with Explicit Cache Block Sizes and Interchange
Example 7-14. Transformed Fortran Loop
Example 7-15. Adjacent Loops that Cannot be Fused
Example 7-16. Adjacent Loops Fused After Peeling
Example 7-17. Sketch of a Loop with a Long Body
Example 7-18. Sketch of a Loop After Fission
Example 7-19. Loop Nest that Cannot Be Interchanged
Example 7-20. Loop Nest After Fission and Interchange
Example 7-21. Simple Reduction Loop Needing Prefetch
Example 7-22. Simple Reduction Loop with Prefetch
Example 7-23. Reduction with Conditional Prefetch
Example 7-24. Reduction with Prefetch Unrolled Once
Example 7-25. Reduction Loop Unrolled with Two-Ahead Prefetch
Example 7-26. Reduction Loop Unrolled Four Times
Example 7-27. Reduction Loop Reordered for Pseudo-Prefetching
Example 7-28. Fortran Use of Manual Prefetch
Example 7-29. Typical Fortran Declaration of Local Arrays
Example 7-30. Common, Improper Fortran Practice
Example 7-31. Fortran Loop to which Gather-Scatter Is Applicable
Example 7-32. Fortran Loop with Gather-Scatter Applied
Example 7-33. Fortran Loop That Processes a Vector
Example 7-34. Fortran Loop Transformed to Vector Intrinsic Call
Example 8-1. Typical C Loop
Example 8-2. Amdahl's law: Speedup(n) Given p
Example 8-3. Amdahl's law: p Given Speedup(2)
Example 8-4. Amdahl's Law: p Given Speedup(n) and Speedup(m)
Example 8-5. Fortran Loop with False Sharing
Example 8-6. Fortran Loop with False Sharing Removed
Example 8-7. Easily Parallelized Fortran Vector Routine
Example 8-8. Fortran Vector Operation, Parallelized
Example 8-9. Fortran Vector Operation with Distribution Directives
Example 8-10. Parallel Loop with Affinity in Data
Example 8-11. Parallel Loop with Affinity in Threads
Example 8-12. Loop Parallelized with the NEST Clause
Example 8-13. Loop Parallelized with NEST Clause with Data Affinity
Example 8-14. Loop Parallelized with NEST, AFFINITY, and ONTO
Example 8-15. Fortran Code for Explicit Page Placement
Example 8-16. Declarations Using the Distribute_Reshape Directive
Example 8-17. Valid and Invalid Use of Reshaped Array
Example 8-18. Corrected Use of Reshaped Array
Example 8-19. Gathering Reshaped Data with Copying
Example 8-20. Gathering Reshaped Data with Cache-Friendly Copying
Example 8-21. Reshaped Array as Actual Parameter—Valid
Example 8-22. Reshaped Array as Actual Parameter—Invalid
Example 8-23. Differently Reshaped Arrays as Actual Parameters
Example 8-24. Typical Output of _DSM_VERBOSE
Example 8-25. Test Placement Display from First-Touch Allocation
Example 8-26. Test Placement Display from Round-Robin Placement
Example 8-27. Scalable Placement File
Example 8-28. Scalable Placement File for Two Threads Per Memory
Example 8-29. Various Ways of Distributing Threads to Memories
Example 8-30. Calling dplace Dynamically from Fortran
Example 8-31. Using a Script to Capture Redirected Output from an MPI Job
Example A-1. Naive Function to Find Nearest Point
Example A-2. Nearest-Point Function with Short-Circuit Test
Example C-1. Program adi2.f
Example C-2. Program adi5.f
Example C-3. Program adi53.f
Example C-4. Basic Makefile
Example C-5. Shell Script swplist
Example C-6. SpeedShop Experiment Script ssruno
Example C-7. Awk Script to Analyze Output of perfex -a
Example C-8. Awk Script to Extrapolate Amdahl's Law from Measured Times
Example C-9. Routine va2pa() Returns the Physical Page of a Virtual Address
Example C-10. Routine cpuclock() Gets the Clock Speed from the Hardware Inventory