This document is a guide to the SGI compilers, compiling tools, and the documentation for those products. It provides overview information about the compilers and the performance tools used with the compilers. It also provides a brief description of the documentation available for all of these SGI products.
The SGI compilers include the FORTRAN 77 compiler, the Fortran 90 compiler, the C compiler, the C++ compiler, and the Assembler.
Compiling tools include the WorkShop suite of tools (Debugger, Performance Analyzer, Static Analyzer, ProMP, and Tester) as well as dbx and SpeedShop. In addition to the compilers mentioned previously, these tools also support the Ada compiler.
This book discusses the following topics:
Chapter 2, “Compilers and Compiler Documentation”, describes the different SGI compilers and the documentation that accompanies those compilers.
Chapter 3, “Debuggers and Debugging Documentation”, describes the debugging tools that are available.
Chapter 4, “Optimization, Porting and Tuning Tools and Documentation”, describes optimization tools and the tuning and porting guides available for the o32, n32 and 64-bit compiler systems.
Chapter 5, “Performance Analysis Tools and Documentation”, describes the performance tools available with the WorkShop product set and also describes the SpeedShop analysis tool.
There are three “versions” of Fortran and C/C++ compilers in use at SGI:
The older 32-bit compiler (known as the o32 compiling system).
The newer 32-bit compiler (known as the n32 compiling system).
The 64-bit compiler.
Chapter 2, “Compilers and Compiler Documentation”, discusses these different compiling systems and the documentation that supports those systems.
To tune a program's performance, you must first determine where machine resources are being used. At any point in a process, there is one limiting resource controlling the speed of execution. Processes can be slowed down by:
CPU speed and availability: a CPU-bound process spends its time executing in the CPU and is limited by CPU speed and availability. To improve the performance of CPU-bound processes, you may need to streamline your code. This can entail modifying algorithms, reordering code to avoid interlocks, removing nonessential steps, blocking to keep data in cache and registers, or using alternative algorithms.
I/O processing: an I/O-bound process has to wait for input/output (I/O) to complete. I/O may be limited by disk access speeds or memory caching. To improve the performance of I/O-bound processes, you can try one of the following techniques:
Improve overlap of I/O with computation
Optimize data usage to minimize disk access
Use data compression
Memory size and availability: a program that continuously needs to swap out pages of memory is called memory-bound. Page thrashing is often due to accessing virtual memory on a haphazard rather than strategic basis; cache misses result. Insufficient memory bandwidth could also be the problem.
Bugs: you may find that a bug is causing the performance problem. For example, you may find that you are reading in the same file twice in different parts of the program, that floating-point exceptions are slowing down your program, that old code has not been completely removed, or that you are leaking memory (making malloc calls without the corresponding calls to free).
Performance phases: because programs exhibit different behavior during different phases of operation, you need to identify the limiting resource during each phase. A program can be I/O-bound while it reads in data, CPU-bound while it performs computation, and I/O-bound again in its final stage while it writes out data. Once you've identified the limiting resource in a phase, you can perform an in-depth analysis to find the problem. And after you have solved that problem, you can check for other problems within the phase. Performance analysis is an iterative process.
The documentation available for the compilers and the performance tools can help you pinpoint where these problems are occuring, and can help you determine how to make the necessary changes to improve program performance.