Index

Amdahl's law
Understanding Parallel Speedup and Amdahl's Law
execution time given n and p
Predicting Execution Time with n CPUs
parallel fraction p
Understanding Amdahl's Law
speedup(n ) given p
Understanding Amdahl's Law
superlinear speedup
Understanding Superlinear Speedup

application placement and I/O resources
Application Placement and I/O Resources

application tuning process
About Performance Analysis and Debugging

automatic parallelization
limitations
Using Compiler Options

avoiding segmentation faults
Avoiding Segmentation Faults

cache bank conflicts
Tuning the Cache Performance

cache coherency
Cache Coherency

Cache coherent non-uniform memory access (ccNUMA) systems
MPI Job Problems and Application Design

cache performance
Tuning the Cache Performance

ccNUMA
MPI Job Problems and Application Design
See Also cache coherent non-uniform memory access

ccNUMA architecture
ccNUMA Architecture

cgroups
About cpusets and Control Groups (cgroups)

commands
dlook
dlook Command
dplace
dplace Command

common compiler options
Compiler Overview

compiler command line
Compiler Overview

compiler libaries
C/C++
C/C++ Libraries
dynamic libraries
Dynamic Libraries
message passing
SHMEM Message Passing Libraries
overview
Library Overview

compiler libraries
static libraries
Static Libraries

compiler options
tracing and porting
Getting the Correct Results

compiler options for tuning
Using Compiler Options to Optimize Performance

compiling environment
The SGI Compiling Environment
compiler overview
Compiler Overview
debugger overview
About Debugging
libraries
Library Overview
modules
Environment Modules

Configuring MPT
OFED
OFED Tuning Requirements for SHMEM

CPU-bound processes
Sources of Performance Problems

cpusets
About cpusets and Control Groups (cgroups)

data decomposition
Data Decomposition

data dependency
Identifying Opportunities for Loop Parallelism in Existing Code

data parallelism
Data Decomposition

data placement practices
About the Data and Process Placement Tools

data placement tools
Data Process and Placement Tools
cpusets
About the Data and Process Placement Tools
dplace
About the Data and Process Placement Tools
overview
About Nonuniform Memory Access (NUMA) Computers
taskset
About the Data and Process Placement Tools

debugger overview
About Debugging

debuggers
gdb
About Debugging
idb
About Debugging
TotalView
About Debugging

denormalized arithmetic
Compiler Overview

determining parallel code amount
Measuring Parallelization and Parallelizing Your Code

determining tuning needs
tools used
Determining Tuning Needs

distributed shared memory (DSM)
Distributed Shared Memory (DSM)

dlook command
dlook Command

dplace command
dplace Command

Environment variables
Environment Variables for Performance Tuning

explicit data decomposition
Data Decomposition

False sharing
Fixing False Sharing

file limit resources
resetting
Resetting the File Limit Resource Default

Flexible File I/O (FFIO)
Multithreading Considerations
environment variables to set
Environment Variables
operation
About FFIO
overview
About FFIO
simple examples
Simple Examples

floating-point programs
Floating-point Program Performance

Floating-Point Software Assist
Floating-point Program Performance

FPSWA
See Floating-Point Software Assist

functional parallelism
Data Decomposition

Global reference unit (GRU)
MPI Application Communication on SGI Hardware

GNU debugger
About Debugging

Gustafson's law
Gustafson's Law

implicit data decomposition
Data Decomposition

I/O tuning
application placement
About I/O Tuning
layout of filesystems
Layout of Filesystems and XVM for Multiple RAIDs

I/O-bound processes
Sources of Performance Problems

iostat command
Using the iostat(1) command

Java environment variables
setting
Setting Java Enviroment Variables

layout of filesystems
Layout of Filesystems and XVM for Multiple RAIDs

limits
system
Resetting System Limits

Linux shared memory accounting
Linux Shared Memory Accounting

memory
cache coherency
Cache Coherency
ccNUMA architecture
ccNUMA Architecture
distributed shared memory (DSM)
Distributed Shared Memory (DSM)
non-uniform memory access (NUMA)
Non-uniform Memory Access (NUMA)

memory accounting
Linux Shared Memory Accounting

memory management
About the Compiling Environment
Memory Use Strategies

memory page
About the Compiling Environment

memory strides
Tuning the Cache Performance

memory-bound processes
Sources of Performance Problems

Message Passing Toolkit
for parallelization
Using SGI MPI

modules
Environment Modules
command examples
Environment Modules

MPI on SGI UV systems
general considerations
About MPI Application Tuning
job performance types
MPI Job Problems and Application Design
other ccNUMA performance issues
MPI Job Problems and Application Design

MPI on UV systems
MPI Application Communication on SGI Hardware

MPI profiling
MPI Performance Tools

MPInside profiling tool
MPI Performance Tools

non-uniform memory access (NUMA)
Non-uniform Memory Access (NUMA)

NUMA Tools
command
dlook
dlook Command
dplace
dplace Command

OFED configuration for MPT
OFED Tuning Requirements for SHMEM

OpenMP
Using OpenMP
environment variables
Environment Variables for Performance Tuning

parallel execution
Amdahl's law
Understanding Parallel Speedup and Amdahl's Law
parallel fraction p
Understanding Amdahl's Law

parallel speedup
Understanding Parallel Speedup

parallelization
automatic
Using Compiler Options
using MPI
Using SGI MPI
using OpenMP
Using OpenMP

perf tool
Profiling with perf

performance
VTune
Other Performance Analysis Tools

performance analysis
About Performance Analysis and Debugging

performance gains
types of
About Performance Analysis and Debugging

performance problems
sources
Sources of Performance Problems

PerfSuite script
Profiling with PerfSuite

process placement
determining
Determining Process Placement
set-up
Determining Process Placement
using OpenMP
Example Using OpenMP
using pthreads
Example Using pthreads

profiling
MPI
MPI Performance Tools
perf
Profiling with perf
PerfSuite
Profiling with PerfSuite

ps command
Using the ps(1) Command

resetting default system stack size
Resetting the Default Stack Size

resetting file limit resources
Resetting the File Limit Resource Default

resetting system limit resources
Resetting System Limits

resetting virtual memory size
Resetting Virtual Memory Size

resident set size
About the Compiling Environment

sar command
Using the sar(1) command

segmentation faults
Avoiding Segmentation Faults

setting Java environment variables
Setting Java Enviroment Variables

SGI PerfBoost
MPI Performance Tools

SGI PerfCatcher
MPI Performance Tools

SHMEM
SHMEM Message Passing Libraries

shortening execution time
Adding CPUs to Shorten Execution Time

stack size
resetting
Resetting the Default Stack Size

suggested shortcuts and workarounds
Suggested Shortcuts and Workarounds

superlinear speedup
Understanding Superlinear Speedup

swap space
About the Compiling Environment

system
overview
About the Compiling Environment

system configuration
Determining System Configuration

system limit resources
resetting
Resetting System Limits

system limits
address space limit
Resetting System Limits
core file siz
Resetting System Limits
CPU time
Resetting System Limits
data size
Resetting System Limits
file locks
Resetting System Limits
file size
Resetting System Limits
locked-in-memory address space
Resetting System Limits
number of logins
Resetting System Limits
number of open files
Resetting System Limits
number of processes
Resetting System Limits
priority of user process
Resetting System Limits
resetting
Resetting System Limits
resident set size
Resetting System Limits
stack size
Resetting System Limits

system monitoring tools
About the Operating System Monitoring Commands

system usage commands
Operating System Monitoring Commands
iostat
Using the iostat(1) command
ps
Using the ps(1) Command
sar
Using the sar(1) command
vmstat
Using the vmstat(8) Command
w
Using the w(1) command

taskset command
taskset Command

tools
perf
Profiling with perf
PerfSuite
Profiling with PerfSuite
VTune
Other Performance Analysis Tools

tuning
cache performance
Tuning the Cache Performance
environment variables
Environment Variables for Performance Tuning
false sharing
Fixing False Sharing
heap corruption
Managing Heap Corruption Problems
managing memory
Memory Use Strategies
multiprocessor code
Tuning Multiprocessor Codes
parallelization
Measuring Parallelization and Parallelizing Your Code
profiling
perf
Profiling with perf
PerfSuite script
Profiling with PerfSuite
VTune analyzer
Other Performance Analysis Tools
single processor code
Single Processor Code Tuning
using compiler options
Using Compiler Options to Optimize Performance
using math functions
Using Tuned Code
verifying correct results
Getting the Correct Results

uname command
Determining System Configuration

unflow arithmetic
effects of
Compiler Overview

UV Hub
MPI Application Communication on SGI Hardware

virtual addressing
About the Compiling Environment

virtual memory
About the Compiling Environment

vmstat command
Using the vmstat(8) Command

VTune performance analyzer
Other Performance Analysis Tools

w command
Using the w(1) command