Origin2000and Onyx2 Performance Tuning and Optimization Guide

Document Number: 007-3430-002

Front Matter

| List of Figures | List of Examples | List of Tables |

Table of Contents

About This Guide
Who Can Benefit from This Guide
What the Guide Contains
Related Documents
Text Conventions

1. Understanding SN0 Architecture
Understanding Scalable Multiprocessor Memory
Understanding Scalable Shared Memory
Understanding MIPS R10000 Architecture

2. SN0 Memory Management
Dealing With Nonuniform Access Time
IRIX Memory Locality Management
Achieving Good Performance in a NUMA System

3. Tuning for a Single Process
Getting the Right Answers
Exploiting Existing Tuned Code

4. Profiling and Analyzing Program Behavior
Profiling Tools
Analyzing Performance with Perfex
Using SpeedShop
Using Address Space Profiling

5. Using Basic Compiler Optimizations
Understanding Compiler Options
Exploiting Software Pipelining
Informing the Compiler
Exploiting Interprocedural Analysis

6. Optimizing Cache Utilization
Understanding the Levels of the Memory Hierarchy
Identifying Cache Problems with Perfex and SpeedShop
Using Other Cache Techniques

7. Using Loop Nest Optimization
Understanding Loop Nest Optimizations
Using Outer Loop Unrolling
Using Loop Interchange
Controlling Cache Blocking
Using Loop Fusion and Fission
Using Prefetching
Using Array Padding
Using Gather-Scatter and Vector Intrinsics

8. Tuning for Parallel Processing
Understanding Parallel Speedup and Amdahl's Law
Compiling Serial Code for Parallel Execution
Explicit Models of Parallel Computation
Tuning Parallel Code for SN0
Scalability and Data Placement
Using Data Distribution Directives
Non-MP Library Programs and Dplace

A. Bentley's Rules Updated
Space-for-Time Rules
Time-for-Space Rules
Loop Rules
Logic Rules
Procedure Design Rules
Expression Rules

B. R10000 Counter Event Types
Counter Events In Detail

C. Useful Scripts and Code
Program adi2
Basic Makefile
Software Pipeline Script swplist
Shell Script ssruno
Awk Script for Perfex Output
Awk Script for Amdahl's Law Estimation
Page Address Routine va2pa()
CPU Clock Rate Routine cpuclock()