Index

64-bit address space
Selecting an ABI and ISA

adi2 example program
Program adi2

aliasing models
Understanding Aliasing Models

Amdahl's law
Understanding Parallel Speedup and Amdahl's Law
awk script for
Awk Script for Amdahl's Law Estimation
execution time given n and p
Predicting Execution Time with n CPUs
parallel fraction p
Understanding Amdahl's Law
parallel fraction p given speedup(n)
Calculating the Parallel Fraction of a Program
speedup(n) given p
Understanding Amdahl's Law
superlinear speedup
Understanding Superlinear Speedup

application binary interface (ABI)
Selecting an ABI and ISA
64-bit
64-Bit ABI
new 32-bit
New 32-Bit ABI
old 32-bit
Old 32-Bit ABI

arithmetic error
Understanding Arithmetic Standards

array padding
Using Array Padding
Diagnosing and Eliminating Cache Thrashing
Using Array Padding to Prevent Thrashing

auto-parallelizing
Compiling Serial Code for Parallel Execution

Bentley, Jon
Bentley's Rules Updated

cache
and hardware event counter
Primary Cache Use
blocking
Controlling Cache Blocking
Understanding Cache Blocking
cache miss
Understanding Level-One and Level-Two Cache Use
coherent
Understanding Cache Coherency
Cache Coherency Events
compiler's model of
Adjusting the Optimizer's Cache Model
contention in
Diagnosing Cache Problems
correcting
Correcting Cache Contention in General
event 31 reveals
Diagnosing Cache Problems
Identifying False Sharing
diagnosing problems in
Identifying Cache Problems with Perfex and SpeedShop
Diagnosing Cache Problems
directory-based
Memory Overhead Bits
Understanding Directory-Based Coherency
false sharing of
Identifying False Sharing
L1
Primary Cache Use
Understanding Level-One and Level-Two Cache Use
Level-1 Cache
L2
Understanding Level-One and Level-Two Cache Use
Level-Two Cache
Secondary Cache Use
line size
Understanding Level-One and Level-Two Cache Use
data structure blocking for
Data Structure Augmentation
on-chip
Cache Architecture
operation of
Understanding Level-One and Level-Two Cache Use
Understanding Cache Coherency
Understanding Directory-Based Coherency
principles of use
Principles of Good Cache Use
proper use of
Using Other Cache Techniques
Principles of Good Cache Use
array padding
Using Array Padding
blocking data for
Controlling Cache Blocking
Understanding Cache Blocking
grouping related data for
Grouping Data Used at the Same Time
loop fusion for
Understanding Loop Fusion
parallel execution issues
Diagnosing Cache Problems
stride-one access for
Using Stride-One Access
transposition for
Understanding Transpositions
set-associative
Understanding Level-One and Level-Two Cache Use
thrashing in
Understanding Cache Thrashing
snoopy
Coherency Methods
thrashing
Diagnosing and Eliminating Cache Thrashing
Understanding Cache Thrashing

cache coherence
and hardware event counter
Cache Coherency Events

cache coherency
Understanding Cache Coherency

cache line
Understanding Level-One and Level-Two Cache Use

call hierarchy profile
Profiling the Call Hierarchy

compiler directive
See directive
Text Conventions

compiler feedback file
Creating a Compiler Feedback File

compiler flag
See compiler option
Text Conventions

compiler option
-32
Old 32-Bit ABI
-64
64-Bit ABI
-apo
Compiling a Parallel Version of a Program
-check_bounds
Using Array Padding
Computational Differences
-clist
Reading the Transformation File
-fb
Passing a Feedback File
Creating a Compiler Feedback File
-flist
Reading the Transformation File
-INLINE
Using Automatic Inlining
Using Manual Inlining
-IPA
Requesting IPA
forcedepth
Using Automatic Inlining
inline
Using Automatic Inlining
space
Using Automatic Inlining
-LNO
Using Loop Nest Optimization
blocking
Adjusting Cache Blocking Block Sizes
fission
Controlling Fission and Fusion
fusion
Controlling Fission and Fusion
gather_scatter
Understanding Gather-Scatter
ignore_pragmas
Requesting LNO
interchange=off
Using Loop Interchange
outer_unroll
Controlling Loop Unrolling
prefetch
Controlling Prefetching
vintr
Vector Intrinsics
-mips3
New 32-Bit ABI
-mips4
New 32-Bit ABI
Recommended Starting Options
-n32
New 32-Bit ABI
Recommended Starting Options
-O2
Recommended Starting Options
-O3
for SWP
Enabling Software Pipelining with -O3
-Ofast
versus -O3
Compile -O3 or -Ofast for Critical Modules
-Olimit
Using Automatic Inlining
-On
Setting Optimization Level with -On
-OPT
alias
Understanding Aliasing Models
cray_ivdep
Breaking Other Dependencies
IEEE_arithmetic
Recommended Starting Options
IEEE Conformance
IEEE_NaN_inf
IEEE Conformance
liberal_ivdep
Breaking Other Dependencies
reorg_common
Using Array Padding
roundoff
Roundoff Control
-r10000
Setting Target System with -TARG
Standard Math Library
-r5000
Setting Target System with -TARG
Standard Math Library
-r8000
Setting Target System with -TARG
Standard Math Library
-S
Reading Software Pipelining Messages
-static
Uninitialized Variables
-TARG
Setting Target System with -TARG
-TENV
Profiling Exception Frequency
X
Controlling the Level of Speculation
-trapuv
Uninitialized Variables
@
Understanding Compiler Options
default
Understanding Compiler Options
for cache model
Adjusting the Optimizer's Cache Model
IEEE_arithmetic
Exploit Algebraic Identities
roundoffWhen
Exploit Algebraic Identities

copying
to reduce TLB thrashing
Using Copying to Circumvent TLB Thrashing

correctness
Getting the Right Answers

CPU
See MIPS CPU
Text Conventions

CrayLink
Hub and CrayLink

data distribution
Using Data Distribution Directives
and dplace
Using _DSM_VERBOSE
directives for
Understanding Directive Syntax
Distribute directive
Using Distribute for Loop Parallelization
mapping types
Understanding Distribution Mapping Options
ONTO clause
Understanding the ONTO Clause
page placement
Using the Page_Place Directive for Custom Mappings
redistribution
Understanding the Redistribution Directives
reshaped
Using Reshaped Distribution Directives
restrictions
Restrictions of Reshaped Distribution

data placement
Scalability and Data Placement
for libmp programs
Tuning Data Placement for MP Library Programs
modifying code for
Modifying the Code to Tune Data Placement

DAXPY
Understanding Software Pipelining
and alias model
Use Alias=Restrict When Possible
loop fusion of
Understanding Loop Fusion
with indirection
Breaking Other Dependencies

debugging
possible with -O2
Start with -O2 for All Modules
use -O0 for
Use -O0 for Debugging

dependency
Breaking Other Dependencies

directive
blocking size
Adjusting Cache Blocking Block Sizes
for data distribution
Fortran Source with Directives
Using Data Distribution Directives
Distribute
Using Distribute for Loop Parallelization
page place
Using the Page_Place Directive for Custom Mappings
syntax
Understanding Directive Syntax
for loop fission
Controlling Fission and Fusion
for loop fusion
Controlling Fission and Fusion
for loop interchange
Using Loop Interchange
for loop nest optimizer
Requesting LNO
for loop unrolling
Controlling Loop Unrolling
for parallel execution
Fortran Source with Directives
affinity clause
Understanding the AFFINITY Clause for Data
Using Parallel Do with Distributed Data
Understanding the AFFINITY Clause for Threads
nest clause
Understanding the NEST Clause
for prefetching
Controlling Prefetching
ivdep
Breaking Other Dependencies
OpenMP
Fortran Source with Directives

dlook
Applying dlook

dplace
Non-MP Library Programs and Dplace
disables data distributiondirectives
Using _DSM_VERBOSE
enable migration with
Enabling Page Migration
library interface to
Using the dplace Library for Dynamic Placement
not for use with libmp
Non-MP Library Programs and Dplace
placement file
Placement File Syntax
distribute statement
Assigning Threads to Memories
memories statement
Using the memories Statement
threads statement
Using the threads Statement
set page size with
Using Larger Page Sizes to Reduce TLB Misses
Changing the Page Size
specify topology with
Specifying the Topology
with MPI
Using dplace with MPI 3.1

dprof
Applying dprof

dynamic page migration
Enabling Page Migration
Dynamic Page Migration
Trying Dynamic Page Migration
administration
Trying Dynamic Page Migration
enabling
Trying Dynamic Page Migration

environment variable
_DSM_MIGRATION
Trying Dynamic Page Migration
Experimenting with Migration Levels
_DSM_PPM
Advanced Options
_DSM_ROUND_ROBIN
Trying Round-Robin Placement
_DSM_VERBOSE
Using _DSM_VERBOSE
for SpeedShop
Identifying False Sharing
in dplace placement file
Using Environment Variables in Placement Files
MP_SET_NUMTHREADS
Controlling a Parallelized Program at Run Time
MPI_DSM_OFF
Using dplace with MPI 3.1
PAGESIZE_*
Using Larger Page Sizes to Reduce TLB Misses
SGI_ABI
Specifying the ABI
SpeedShop use of
Sampling Through Other Hardware Counters
TRAP_FPE
Understanding Treatment of Underflow Exceptions

event counter
See hardware event counter
Text Conventions

exception
event counter overflow
R10000 Counter Event Types
from speculative execution
Permitting Speculative Execution
handling
Using Exception Profiling
profiling occurrence of
Using Exception Profiling
TLB miss
Understanding TLB and Virtual Memory Use
underflow
Understanding Treatment of Underflow Exceptions

exception profile
Using Exception Profiling

false sharing
Identifying False Sharing
Memory Contention

fast fourier transform (FFT)
Understanding Transpositions
data placement for
First-Touch Placement with Multiple Data Distributions

feedback file
Creating a Compiler Feedback File
use of
Passing a Feedback File

FFT
See fast fourier transform (FFT)
Text Conventions

first-touch placement
Programming For First-Touch Placement
Using First-Touch Placement

floating-point exception
See exception
Text Conventions

floating-point status register (FSR)
Understanding Treatment of Underflow Exceptions

graduated instruction
Graduated Instructions

hardware event counter
R10000 Counter Event Types
branch instructions
Branching Instructions
cache coherency
Cache Coherency Events
cache use
Primary Cache Use
clock cycles
Clock Cycles
event 21
Finding and Removing Memory Access Problems
Displaying Operation Counts
event 31
Identifying False Sharing
Sampling Through Other Hardware Counters
Finding and Removing Memory Access Problems
Diagnosing Cache Problems
event 4
Finding and Removing Memory Access Problems
instruction counts
Instructions Issued and Done
lock instructions
Lock-Handling Instructions
profiling from
Sampling through Hardware Event Counters
Sampling Through Other Hardware Counters
TLB miss
Virtual Memory Use

hardware graph
Indicating Resource Affinity

hardware trap
See exception, page fault, TLB
Text Conventions

hub
Hub and CrayLink
SN0 Organization
cache coherency support
Understanding Directory-Based Coherency

hypercube
SN0 Memory Distribution
SN0 Organization

ideal time profile
Using Ideal Time Profiling

IEEE 754
Understanding Arithmetic Standards
versus optimization
IEEE Conformance

IEEE arithmetic
Understanding Arithmetic Standards

inlining
Understanding Inlining
automatic versus manual
Understanding Inlining
manual with -INLINE
Using Manual Inlining

instruction scheduling
Understanding Software Pipelining
Setting Target System with -TARG

instruction set architecture (ISA)
MIPS I
Old 32-Bit ABI
MIPS II
Old 32-Bit ABI
MIPS III
Old 32-Bit ABI
New 32-Bit ABI
MIPS IV
New 32-Bit ABI
MIPS IV Instruction Set Architecture

interprocedural analysis (IPA)
Exploiting Interprocedural Analysis
applied during link step
Compiling and Linking with IPA
features of
Exploiting Interprocedural Analysis
requesting
Requesting IPA

-IPA:See compiler option, -IPA
Text Conventions

IRIX
memory management in
SN0 Memory Management
porting to
Dealing with Porting Issues

lazy evaluation
Lazy Evaluation

ld
performs IPA
Compiling and Linking with IPA

library
BLAS
SCSL Library
CHALLENGEcomplib Library
CHALLENGEcomplib
CHALLENGEcomplib Library
Exploiting Existing Tuned Code
EISPACK
CHALLENGEcomplib Library
LAPACK
SCSL Library
CHALLENGEcomplib Library
libc
Standard Math Library
libfastm
libfastm Library
Exploiting Existing Tuned Code
Recommended Starting Options
libfpe
Using Exception Profiling
Understanding Treatment of Underflow Exceptions
libmp
Controlling a Parallelized Program at Run Time
conflicts with dplace
Non-MP Library Programs and Dplace
data placement with
Tuning Data Placement for MP Library Programs
page migration with
Experimenting with Migration Levels
Trying Dynamic Page Migration
page size control
Using Larger Page Sizes to Reduce TLB Misses
round-robin placement with
Trying Round-Robin Placement
LINPACK
CHALLENGEcomplib Library
SCSL
Exploiting Existing Tuned Code
SCSL Library

library routine
bzero
Initializing to Zero
calloc
Initializing to Zero
dplace_file
Using the dplace Library for Dynamic Placement
dplace_line
Using the dplace Library for Dynamic Placement
dsm_home_threadnum
Using Dynamic Placement Information
handle_sigfpes
Using Exception Profiling
sasum
Using Reshaped Distribution Directives
sscal
Using Reshaped Distribution Directives

-LNO:See loop nest optimizer (LNO) and compiler option -LNO
Text Conventions

loop fission
Using Loop Fission

loop fusion
by LNO
Using Loop Fusion
manual
Understanding Loop Fusion

loop interchange
Using Loop Interchange
disabling
Using Loop Interchange

loop nest optimizer (LNO)
Using Loop Nest Optimization
cache blocking by
Controlling Cache Blocking
controlling
Adjusting Cache Blocking Block Sizes
disable loop transformation
Requesting LNO
gather-scatter by
Understanding Gather-Scatter
loop fission by
Using Loop Fission
loop fusion by
Using Loop Fusion
loop interchange
Using Loop Interchange
loop unrolling
Using Outer Loop Unrolling
prefetching by
Prefetch Overhead and Unrolling
requesting
Requesting LNO
transformed source file
Reading the Transformation File
vector intrinsic transformation
Vector Intrinsics

loop peeling
Using Loop Fusion

loop unrolling
and roundoff
Roundoff Control
and SWP
Using Outer Loop Unrolling
by loop nest optimizer (LNO)
Using Outer Loop Unrolling
with loop interchange
Combining Loop Interchange and Loop Unrolling

makefile
example
Basic Makefile
use of
Using a Makefile

math libraries
Exploiting Existing Tuned Code
vector intrinsics
Standard Math Library

matrix multiply
loop unrolling of
Using Outer Loop Unrolling
memory use in
Understanding Cache Blocking
performance of
Understanding Cache Blocking

matrix multipy
cache blocking of
Controlling Cache Blocking

memory
64-bit addressing
Selecting an ABI and ISA
administrator setup
Trying Dynamic Page Migration
Using Larger Page Sizes to Reduce TLB Misses
bus-based
Memory for Multiprocessors
Scalability in Multiprocessors
cache directory bits
Memory Overhead Bits
contention for
Memory Contention
distributed versus shared
Shared Memory Multiprocessing
error correction bits
Memory Overhead Bits
hierarchy
Understanding the Levels of the Memory Hierarchy
latency of
SN0 Latencies and Bandwidths
Degrees of Latency
locality management
Memory Locality Management
management by IRIX
SN0 Memory Management
page fault
Understanding TLB and Virtual Memory Use
paged virtual
Understanding TLB and Virtual Memory Use
parallel execution tuning
Finding and Removing Memory Access Problems
physical address display
Page Address Routine va2pa()
placement
first-touch
Using First-Touch Placement
Programming For First-Touch Placement
round-robin
Trying Round-Robin Placement
Using Round-Robin Placement
prefetching
Understanding Prefetching
Using Prefetching
stride
Using Stride-One Access
virtual
Understanding Level-One and Level-Two Cache Use
ZZZ
Text Conventions

memory locality domain (MLD)
Memory Locality Domain Use
Memory Locality Management

memory locality domain set (MLDS)
Memory Locality Domain Use

Message-Passing Interface (MPI)
Message-Passing Models MPI and PVM
dplace with
Using dplace with MPI 3.1
perfex with
Using perfex with MPI

MIPS CPU
architecture of
Understanding MIPS R10000 Architecture
Understanding Prefetching
event counters in
R10000 Counter Event Types
issued versus graduated instruction
Graduated Instructions
off-chip cache
Level-Two Cache
on-chip cache
Cache Architecture
out-of-order execution
Executing Out of Order
R10000
speculative execution
Hardware Speculative Execution
underflow control
Understanding Treatment of Underflow Exceptions
R4000
Specifying the ABI
R8000
Dealing with Software Pipelining Failures
Specifying the ABI
Software Speculative Execution
underflow ignored on
Understanding Treatment of Underflow Exceptions
specify to compiler
Standard Math Library
speculative execution
Speculative Execution
superscalar features
Superscalar CPU Features
ZZZ
Text Conventions

MIPS IV ISA
MIPS IV Instruction Set Architecture
and IEEE 754
IEEE Conformance
prefetch in
Understanding Prefetching

MP library
See library,libmp
Text Conventions

MPI
See Message-Passing Interface (MPI)
Text Conventions

mpirun
with perfex
Using perfex with MPI

node
SN0 Organization
SN0 Node Board
CPU in
CPUs and Memory

nonuniform memory access (NUMA)
SN0 Memory Distribution
Dealing With Nonuniform Access Time
and parallel program
Parallel Programs under NUMA
and single-threaded program
Single-Threaded Programs under NUMA

numeric error
Understanding Arithmetic Standards

OpenMP directives
Fortran Source with Directives
C pragmas for
C and C++ Source with Pragmas

-OPT:See compiler option, -OPT
Text Conventions

optimization level
Setting Optimization Level with -On

out of order execution
Executing Out of Order

packing
Packing

page
Understanding TLB and Virtual Memory Use
migration of
Enabling Page Migration
Dynamic Page Migration
Trying Dynamic Page Migration
replication of
Replication of Read-Only Pages
size of
Policy Modules
Dynamic Page Migration
Single-Threaded Programs under NUMA
Using Larger Page Sizes to Reduce TLB Misses
set with dplace
Changing the Page Size
valid sizes
Using Larger Page Sizes to Reduce TLB Misses

page fault
Understanding TLB and Virtual Memory Use

parallel execution
affinity clause
Using Parallel Do with Distributed Data
Understanding the AFFINITY Clause for Data
Understanding the AFFINITY Clause for Threads
Amdahl's law
Understanding Parallel Speedup and Amdahl's Law
auto-parallizing
Compiling Serial Code for Parallel Execution
data placement for
Scalability and Data Placement
memory access tuning for
Finding and Removing Memory Access Problems
nest clause
Understanding the NEST Clause
parallel fraction p
Understanding Amdahl's Law
Ensuring That the Program Is Properly Parallelized
programming models for
Explicit Models of Parallel Computation
scalability of
Scalability and Data Placement
Scalability in Multiprocessors
topology
Specifying the Topology
tuning SN0 for
Tuning Parallel Code for SN0

perfex
Analyzing Performance with Perfex
absolute event counts
Taking Absolute Counts of One or Two Events
analytic output
Getting Analytic Output with the -y Option
awk script to parse
Awk Script for Perfex Output
cache use analysis
Identifying Cache Problems with Perfex and SpeedShop
library interface
Collecting Data over Part of a Run
statistical counts
Taking Statistical Counts of All Events

performance
aphorisms about
Bentley's Rules Updated
of matrix multiply
Understanding Cache Blocking
of parallel program
Parallel Programs under NUMA
of single-threaded program
Single-Threaded Programs under NUMA

performance techniques
algebraic identities
Exploit Algebraic Identities
Exploit Algebraic Identities
array padding
Using Array Padding to Prevent Thrashing
Using Array Padding
avoiding tests
Combining Tests
cache blocking
Understanding Cache Blocking
Controlling Cache Blocking
Controlling Cache Blocking
caching
Principles of Good Cache Use
code motion
Code Motion Out of Loops
combining related functions
Combine Paired Computation
common block padding
Exploiting Interprocedural Analysis
common subexpressions
Eliminate Common Subexpressions
constant propagation
Exploiting Interprocedural Analysis
copying
Using Copying to Circumvent TLB Thrashing
coroutines
Use Coroutines
data structure augmentation
Data Structure Augmentation
dead function elimination
Exploiting Interprocedural Analysis
dead variable elimination
Exploiting Interprocedural Analysis
gather-scatter
Understanding Gather-Scatter
inlining
Exploiting Interprocedural Analysis
Collapse Procedure Hierarchies
interpreters
Interpreters
lazy evaluation
Lazy Evaluation
loop fission
Using Loop Fission
loop fusion
Loop Fusion
Using Loop Fusion
Understanding Loop Fusion
loop interchange
Using Loop Interchange
loop unrolling
Loop Unrolling
Using Outer Loop Unrolling
packing
Packing
precomputation
Store Precomputed Results
Precompute Logical Functions
prefetching
Using Prefetching
recursion elimination
Transform Recursive Procedures
short-circuiting
Short-Circuit Monotone Functions
software pipelining
Understanding Software Pipelining
speculative execution
Permitting Speculative Execution
transposition
Understanding Transpositions

policy module (PM)
Policy Modules
Memory Locality Management

Portable Virtual Machine (PVM)
Message-Passing Models MPI and PVM

POSIX threads
C Source Using POSIX Threads

pragma
See directive
Text Conventions

precomputation
Store Precomputed Results

prefetching
Understanding Prefetching
Using Prefetching
controlling
Controlling Prefetching
manual
Using Manual Prefetching
overhead of
Prefetch Overhead and Unrolling
pseudo
Using Pseudo-Prefetching

prof
default report
Displaying Profile Reports from Sampling
feedback file
Creating a Compiler Feedback File
ideal time report
Default Ideal Time Profile
line numbers off with opt
Including Line-Level Detail
option -archinfo
Displaying Operation Counts
option -butterfly
Displaying Ideal Time Call Hierarchy
option -feedback
Passing a Feedback File
Creating a Compiler Feedback File
option -heavy
Displaying Profile Reports from Sampling
Including Line-Level Detail
option -lines
Including Line-Level Detail
simplifying report
Removing Clutter from the Report

profiling
address space usage
Using Address Space Profiling
cache usage
Identifying Cache Problems with Perfex and SpeedShop
call hierarchy
Profiling the Call Hierarchy
ideal time for
Identifying Cache Problems with Perfex and SpeedShop
Using Ideal Time Profiling
opcode counts
Displaying Operation Counts
sampling for
Understanding Sample Time Bases
Identifying Cache Problems with Perfex and SpeedShop
tools for
Profiling Tools

program correctness
Getting the Right Answers

R4000:See MIPS CPU
Text Conventions

R8000:See MIPS CPU
Text Conventions

R10000
See MIPS CPU
Text Conventions

round-robin placement
Using Round-Robin Placement
Trying Round-Robin Placement

roundoff
Roundoff Control

scalability
Scalability in Multiprocessors
and bus architecture
Scalability in Multiprocessors
and data placement
Scalability and Data Placement
and shared memory
Scalability and Shared, Distributed Memory

smake
Using a Makefile

SN0
CrayLink
Hub and CrayLink
hub
SN0 Organization
Hub and CrayLink
Input/Output
SN0 Input/Output
latencies
SN0 Latencies and Bandwidths
node
SN0 Organization
SN0 Node Board
router
SN0 Organization
XIO
XIO Connection
SN0 Organization

SN0 architecture
Understanding SN0 Architecture
building blocks of
SN0 Organization
hypercube
SN0 Organization
SN0 Memory Distribution
nonuniform memory access (NUMA)
SN0 Memory Distribution

snoopy cache
Coherency Methods

software pipelining (SWP)
Exploiting Software Pipelining
compiler report in
script to extract
Software Pipeline Script swplist
compiler report in .s
Reading Software Pipelining Messages
Using Outer Loop Unrolling
dereferenced pointer defeats
Improving C Loops
effect of alias model
Use Alias=Restrict When Possible
enable with -O3
Enabling Software Pipelining with -O3
failure cause
Dealing with Software Pipelining Failures
global variables defeat
Improving C Loops
loop unrolling with
Using Outer Loop Unrolling
of DAXPY loop
Pipelining the DAXPY Loop

speculative execution
Speculative Execution
Permitting Speculative Execution
hardware driven
Hardware Speculative Execution
software-driven
Software Speculative Execution

speedshop
Using SpeedShop
sample time bases
Understanding Sample Time Bases
ZZZ
Text Conventions

ssrun
exception trace
Profiling Exception Frequency
experiment types
Understanding Sample Time Bases
ideal time trace
Capturing an Ideal Time Trace
Passing a Feedback File
output filename format
Performing ssrun Experiments
shell script to run
Shell Script ssruno
usertime experiment
Displaying Usertime Call Hierarchy
using
Performing ssrun Experiments

stride
Using Stride-One Access

superlinear speedup
Understanding Superlinear Speedup

superscalar
Superscalar CPU Features

-SWP:See compiler option, -SWP
Text Conventions

swplist shell script
Reading Software Pipelining Messages

system routine
mmap
Initializing to Zero
C and C++ Source Using UNIX Processes
sproc
C and C++ Source Using UNIX Processes
sysmp
Advanced Options
syssgi
Using Dynamic Placement Information

thread
C Source Using POSIX Threads

TLB
See translate lookaside buffer (TLB)
Text Conventions

translate lookaside buffer (TLB)
Understanding TLB and Virtual Memory Use
miss
Understanding TLB and Virtual Memory Use
hardware counter
Virtual Memory Use
thrashing elimination
Diagnosing and Eliminating TLB Thrashing
copying
Using Copying to Circumvent TLB Thrashing
larger page size
Using Larger Page Sizes to Reduce TLB Misses

transposition
Understanding Transpositions

trap
See exception
Text Conventions

uninitialized variable, avoiding
Uninitialized Variables

vector intrinsic function
Standard Math Library
and LNO
Vector Intrinsics

virtual memory
Understanding TLB and Virtual Memory Use

XIO
SN0 Organization
XIO Connection

zero-fill
Initializing to Zero