This chapter contains the following sections:
“Overview” discusses the PFA output files and provides examples of them.
“Formatting the Listing File” explains how to change the format of the standard listing file.
“Interpreting Default Listing Information” explains the contents of the listing file.
“Sample Listing Files” provides sample listing files along with an interpretation of each.
PFA generates two files, a listing file (.l) and an intermediate file (.m). Invoking PFA as part of a Fortran compilation produces a line-numbered listing file when you use the -pfa list option. If you specify the -keep option, PFA produces both the numbered listing file and the intermediate file. PFA automatically produces both files when you invoke it directly. (For details about invoking PFA, refer to Chapter 2, “How to Use PFA.”)
For example, consider the following program, sample.f:
subroutine sample (a,b,c) dimension a(1000),b(1000),c(1000) do 10 i = 1, 1000 10 a(i) = b(i) + c(i) end |
Compiling sample.f as follows
% f77 -pfa keep sample.f |
generates the following listing file, sample.l:
Actions Do Loops Line DIR 1 # 1 “sample.f” 2 subroutine sample(a,b,c) 3 dimension a(1000),b(1000),c(1000) c +-------- 4 do 10 i = 1,1000 *_______ 5 10 a (i) = b(i) + c(i) 6 end Abbreviations Used DIR directive C concurrentized Loop Summary From To Loop Loop Loop# line line label index Status 1 4 5 DO 10 I concurrentized |
and the intermediate file, sample.m:
# 1 “sample.f” # 1 “sample.f” subroutine sample(a,b,c) DIMENSION A(1000), B(1000), C(1000) # 3 “sample.f” C$DOACROSS SHARE(A,B,C),LOCAL(I) # 3 “sample.f” DO 2 I=1,000 # 4 “sample.f” A(I) = B(I) + C(I) # 4 “sample.f” 2 CONTINUE end |
PFA placed a C before the first statement of the DO loop in the listing file, sample.l . The Abbreviations Used table shows that C stands for “concurrentized,” which means that PFA determined that it can safely run the loop in parallel. The Loop Summary table at the bottom of sample.l shows that the status of the loop is concurrentized.
PFA inserted the statement starting with C$DOACROSS before the DO statement in the intermediate file, sample.m. The Fortran 77 compiler directive C$DOACROSS tells f77 that the next DO loop can run in parallel. The phrase SHARE (A,B,C) informs the Fortran 77 compiler that all processes that execute the DO loop share the arrays A, B, and C. The phrase LOCAL(I) indicates that every process executing the DO loop keeps a local variable I. The lines of the form # 4 "sample.f" are called line number directives. They relate the transformed source back to the original source.
![]() | Note: The first line number directive appears in the listing because it was actually added by cpp before PFA ran. |
You customize a PFA listing file by
paginating the listing
selecting the information to be printed
disabling specific message classes
The -LINES= n option (or -LN= n) paginates the listing for printing. Use this to change the number of lines per page. Specifying -LINES=0 paginates at subroutine boundaries.
If you do not specify the -LINES option, PFA prints 55 lines per page.
The -LISTOPTIONS= list option (or -LO= list) specifies the information to include in the listing file (.l), where list is any combination of the options in Table 3-1.
Table 3-1. Listing File Include Options
Value | Produces |
---|---|
C | Calling tree at the end of the program listing. |
I | Transformed program file annotated with line numbers in the source program. Error messages and debugging information can refer to the original source rather than the transformed source. Running PFA as part of an f77 compile automatically adds this option. |
K | Print out of the PFA options used at the end of each program unit. |
L | Loop-by-loop optimization table. |
N | Program unit names, as processed, to the standard error file. This option is added automatically as part of an f77 -v compilation. |
O | Annotated listing of the original program. |
P | Processing performance statistics. |
S | Summary of optimization performed. |
T | Annotated listing of the transformed program. |
Use the -SUPPRESS= list option (or -su= list) to disable individual classes of PFA messages that are normally included in the listing (.l) file. These messages range from syntax warnings and error messages to messages about the optimizations performed. list is any combination of the options in Table 3-2.
Table 3-2. Listing File Message Disabling Options
Value | Message Class Disabled |
---|---|
D | Data dependence |
E | Syntax error |
I | Information |
N | Unable to run loop in parallel |
Q | Questions |
S | Standard messages |
W | Warning of syntax error (PFA adds the -SUPPRESS=W option automatically if you use the -w option to f77) |
If you do not specify this option, PFA prints messages of all classes.
Knowing when and where to modify your code means understanding the information in the PFA listing. This understanding allows you to recognize where small changes to the source code will make a big difference in how much code is run in parallel.The PFA-generated listing file lists the optimizations PFA made to the code. For example, a message could say that, although three loops could have run in parallel, PFA converted only the one it determined most profitable.
This section explains how to view the listing file online and then lists and describes the various fields.
The listing file is in 132-column format. To view the file, open a window with 132 columns and 40 rows by entering
% wsh -s132,40 |
This section explains the contents of the .l file when you use the default values for the -LISTOPTIONS command line option (that is, O and L).
A default PFA file listing includes
line numbers
DO loop markings
footnotes
syntax errors/warning messages
action summary
A statement in the PFA listing labeled with a line number, such as 21, is the same as line 21 from the original program or has been derived from that line. These line numbers are useful when inspecting the PFA-transformed program listing and when debugging. PFA sometimes generates several lines of code from a single line of the original program; in this case, each new line of code is labeled with the same number as the line of the original program from which it was generated. Consequently, many lines of the PFA-transformed program listing carry the same number because they are related to one line of the original program listing.
The listing file displays DO loops graphically in a column headed DO Loops. The PFA surrounds each DO loop (up to nest level 10) with a loop delimiter character. Each character listed in Table 3-3 has a specific meaning.
Table 3-3. Listing File DO Loop Delimiters
Character | Denotes |
---|---|
| | Generic DO loop |
* | PFA can run loop in parallel |
! | Syntax error |
A statement contained within n DO loops has n of these loop delimiters on that line.
For example,
DO Loops Line +------- 173 DO 100 M=2,MAX(MFLD,2) | 174 IADR = ISECT(M) | 175 IADR1= ISECT(M-1) | 176 PNM(IADR)=(ANM(IADR) *PNM(IADR1)) |_______ 177 100 PPNM(IADR)= -(ANM(IADR) *PNM(IADR1)) |
PFA uses the footnotes listing to give important details concerning its actions. PFA numbers and prints the footnotes at the bottom of each program unit under the Footnote List heading. References to the footnotes are displayed in the listing under the Footnotes column. For example, this footnote
13 DD 1790 IF (B(I) .LE. 6) IB(J*I) = I+J |
appears under Footnote List at the end of the program unit
13: data dependence Data dependence involving this line due to variable IB. |
In this example, 13 is the footnote number, DD (data dependence) is the explanation for PFA's action, and the IF statement on line 1790 refers to the original source line number.
When a program has syntax errors, the listing file describes the error next to the lines that start with the symbol ### in the Footnotes column. These messages are also printed to stderr, which will usually be your terminal.
For example,
Footnotes Actions DO Loops Line 1 SUBROUTINE Z(A,B,N) 2 REAL A(N), B(N) +------- 3 DO 20 I=1,N ! 4 X=A(I) ! 5 Y=B(I) ! ______ 6 20 C(I)=X+Y ### line (6) ### error Array not declared or statement function declared after executable statements. ### error A do loop ends on a non-executable statement. 7 PRINT *,X 8 END |
When PFA translates or modifies a statement, it uses abbreviations in the Actions column of the listing file to identify the statements. PFA lists an abbreviated explanation of its actions at the bottom of the listing. For the DIR and V classes, the class itself serves as the message and no detailed messages follow. All other classes have associated messages.
Table 3-4 lists and explains the values that can appear in the Actions column.
Table 3-4. PFA Action Abbreviations
Value | Meaning |
---|---|
DD | (Data Dependence) Indicates that data dependence prevented PFA from running this statement in parallel. |
DIR | (Directive) Used in conjunction with the footnotes and concerns compiler directives. If you code a compiler directive and that line does not have the DIR abbreviation in the listing, PFA will not recognize the directive. Check the setting of the -DIRECTIVES command line option and the syntax of the directive. |
E | (Error) Indicates syntax errors. These messages can refer to missing or extra characters, illegal keywords, or text placed in the wrong column. PFA cannot do anything with such code. The intermediate (.m) file contains a copy of this program unit that PFA has not modified. |
EX | (Extension) Shows where a construct in the original program is not allowed in the language PFA produces. In some cases, an operation or type is allowed in the input language but not in the output language. |
INF | (Information) Provides noncritical information. |
I | (Insertion) Indicates that PFA added a statement. |
LR | (Loop Reordering) Indicates that PFA has modified a Fortran 77 statement in the process of interchanging loops. If during optimization PFA ascertains that an outer loop would be more efficient as an inner loop, and it can legally reorder the loops, PFA places the outer loop inside. In the process of this reordering, PFA might have to change loop bounds (for triangular loops), distribute loops, or float IF assignments. Only the statements modified for the exchange are marked. |
MIS | (Miscellaneous) Indicates that some PFA information has been lost. This message does not always mean that something is wrong with the program. |
NX | (Nonconcurrent Statement) Indicates that PFA did not try or was unable to run the statement in parallel. For example, when a subroutine call is involved in a loop, PFA generates this message. |
NO | (Program Too Large—Not Optimized) Indicates that the program unit being processed is too large for PFA to optimize, because of PFA's data structure size limitations. When PFA optimizes programs, it adds statements that might also overflow the fixed-size tables. In either case, PFA stops optimization and passes the original program to the intermediate (.m ) file, informing you of this action. For PFA to process the unit, you must split the program into smaller sections. |
OE | (Option Error) Indicates a syntax error in a PFA option. This error does not stop processing of a program unit. |
OTF | (Output Translation Failure) Marks statements that have constructs that exist in the input language but that cannot be represented in the output language. |
Q | (Question) Indicates that PFA tried to optimize a loop nest but discovered a data dependence it could not break at compile time without further information. You can usually answer this question with an appropriate assertion. |
SO | (Scalar Optimization) Marks places in the transformed listing where PFA has optimized a scalar loop. |
STD | (Standardized) Marks where PFA changed a program to improve the chance of finding code that it can optimize. This is often a conversion from an IF/GOTO into a block IF, loop rerolling, and conversion of an IF loop to a DO loop. |
TE | (Translator Error) Indicates an internal PFA error. PFA writes the notification to the standard error file and writes a trace back to the output file. Notify SGI if you see this sort of bug (so it can be corrected) and, if possible, send SGI the code that caused the trace back as well as the trace back itself. If you can reproduce the error in a small program unit, send that small program unit as well. |
W | (Warning) Contains syntax warnings. |
This section contains a few simple examples of Fortran code and the corresponding PFA output. An actual source program would be much larger, and a single loop could contain several of the cases illustrated here. However, even in a large loop, you can deal with each problem individually.
PFA cannot determine if it can run a loop in parallel when the code uses indirect indexing. A loop is indirectly indexed when it uses the value from some auxiliary array as the index value rather than the DO loop variable.
The Fortran 77 code
subroutine foo2(w,b,index,n) real w(n), b(n) integer index(n) do i = 1, n w(index(i)) = w(index(i)) + b(i) enddo end |
when submitted to PFA, results in the listing file
10 11 12 subroutine foo2(w,b,index,n) 13 real w(n), b(n) 14 integer index(n) 15 1 Q +------- 16 do i = 1, n 2 DD ! 17 w(index(i)) = w(index(i)) + b(i) !_______ 18 enddo 19 end Abbreviations Used DD data dependence Q question Footnote List 1: question Is INDEX a permutation vector? 2: data dependence Data dependence involving this line due to variable W. DO Loop Summary loop# from to DO label index workload status 1 16 18 DO I dependencies prevent parallelism |
DD in the Actions column on line 17 of the listing warns that the variable w might carry a dependency. A dependency exists when one iteration of the loop writes to a location that is used by a different iteration of the loop. In this example, if the values of index(i) are ever the same for different values of i, then different iterations might use the same location in w. Therefore, this code contains a possible data dependence.
If you can guarantee that the values of index(i) are always different for each value of i, then there is no dependence (each iteration uses a different location in w). Question one on the Footnote List asks if index(i) is different for every value of i. A permutation vector is a list of numbers, each of which is different from the others. If you know that index is a permutation vector, then the loop is data-independent. An example of a permutation vector is a list of objects in which each object appears exactly once.
Explicitly state that index is a permutation vector by adding an assertion in the source
subroutine foo2(a,b,index,n) real a(n), b(n) integer index(n) c*$*assert permutation (index) do i = 1, n a(index(i)) = a(index(i)) + b(i) enddo end |
Now the listing file shows that PFA finds the loop safe to run in parallel (indicated by the * DO loop delimiter)
Actions DO Loops Line DIR 1 # 1 “foo2.f” 2 subroutine foo2(a,b,index,n) 3 real a(n), b(n) 4 integer index(n) 5 DIR 6 c*$*assert permutation (index) C +------ 7 do i= 1, n * 8 a(index(i)) = a(index(i)) + b(i) *______ 9 enddo 10 end Abbreviations Used DIR directive C concurrentized Loop Summary From To Loop Loop Loop# line line label index Status 1 7 9 Do I concurrentized |
![]() | Note: As with all assertions, PFA does not verify the truth of this assertion. When you make an assertion, be certain that the assertion is always true for all possible input data. |
This example shows what happens when a loop contains a call to an external routine. The Fortran 77 code
subroutine foo3 (a,b,c,n) real a(n), b(n), c(n) external force do i = 1, n a(i) = force (b(i), c(i)) enddo end |
generates the listing
Actions DO Loops Line DIR 1 # 1 “foo3.f” 2 subroutine foo3(a,b,c,n) 3 real a(n), b(n), c(n) 4 external force 5 NCS +------ 6 do i = 1, n NO NCS ! 7 a(i) = force(b(i), c(i)) !______ 8 enddo 9 end Abbreviations Used NO not optimized DIR directive NCS non-concurrent-stmt Footnote List 1: not optimized No optimizable statements found. 2: not optimized Unoptimizable call to “FORCE” found. Loop Summary From To Loop Loop Loop# line line label index Status 1 6 8 Do I unoptimizable call (FORCE) |
Calling the function force prevents PFA from automatically running the loop in parallel. PFA identifies the function call as a non-concurrent-stmt. By its nature, a nonconcurrent statement prevents PFA from assuming the loop is safe to run in parallel because PFA cannot see into the routine to look for data dependencies.
If you know that force generates no data dependencies, then explicitly state this fact for the nonconcurrent statement
subroutine foo3(a,b,c,n) real a(n), b(n), c(n) external force c*$*assert concurrent call do i = 1, n a(i) = force(b(i), c(i)) enddo end |
Now that PFA knows that the nonconcurrent statement involves no data dependency, PFA will find the loop safe to run in parallel.
There is one subtlety in using the concurrent call assertion. When you use this assertion, PFA makes no attempt to examine the called routine; it simply assumes that it is safe. However, PFA is still left with the problem of correctly declaring the variables in the loop to be either SHARE or LOCAL. (PFA does the best it can, but it can sometimes be fooled.) For example,
subroutine tricky (a,b,c,n,m) real a(*), b(*) external my_function c*$*assert concurrent call do i = 1, n a(i) = my_function (b(i), m) b(i) = a(i) + m enddo m = 0 end |
The question is whether the variable m should be SHARE or LOCAL. If the routine my_function only reads the old value of m, then it should be SHARE. If my_function writes a new value of m, then it should be LOCAL. In the absence of any more clues, PFA must go by what it can see; and what it can see is that within the loop, there are no visible assignments to m, and so PFA will declare it to be SHARE. If in fact my_function is writing the value of m, then this is incorrect. In this case, to give PFA the hint it needs, add a visible assignment to m at the top of the loop.
For example, consider the following code:
do i = 1, n m = 0 a(i) = my_function(b(i), m) b(i) = a(i) + m enddo |
Here, PFA can see an assignment to m and so will declare it to be LOCAL. Note that if my_function is both reading the old value and writing a new value of m, then it was not legal to parallelize the loop.
This example shows how PFA produces a single value from a set of values. Because the entire set of values is reduced to a single value, these operations are called reductions.
Consider the Fortran 77 code
subroutine foo4(a,b,n,sum) real a(n), b(n), sum sum = 0.0 do i = 1, n sum = sum + a(i)*b(i) enddo end |
Using the previous code as input, PFA produces the listing file
DIR 1 # 1 “foo4.f” 2 subroutine foo4(a,b,n,sum) 3 real a (n), b(n), sum 4 5 sum = 0.0 +----- 6 do i = i, n 1 DD ! 7 sum = sum + a(i)*b(i) !_____ 8 enddo 9 end Abbreviations Used DD data dependence DIR directive Footnote List 1: data dependence Data dependence involving this line due to variable “SUM”. Loop Summary From To Loop Loop Loop# line line label index Status 1 6 8 Do I scalar mode preferable |
Because different iterations of the loop read and write the same location (the variable sum), there is a dependence. However, this is a special case. Because sum just accumulates a total, you can accumulate subtotals in parallel and then combine the subtotals at the end.
Because the parallel version of the code adds the elements together in a different order than the single-process version, the round-off errors accumulate differently for the two versions of the code. Thus, the answer can differ slightly as you vary the number of processes used to run the code. In fact, if you use the dynamic scheduling option for the code, the answer might vary slightly from one run of the program to the next, even if you use the same number of processes on the same machine.
Most applications can safely ignore this variation in round-off error. If you do not care about this round-off error, you can tell PFA to use parallel subtotals. To tell PFA not to worry about round-off error, you can use either the C*$*ROUNDOFF=2 directive or the f77/pfa command line option -WK , -roundoff=2.
The resulting listing file is
DIR 1 # 1 “foo4.f” 2 subroutine foo4(a,b,n,sum) 3 real a(n), b(n), sum 4 5 sum = 0.0 C +------ 6 do i = 1, n * 7 sum = sum + a(i)*b(i) *______ 8 enddo 9 end Abbreviations Used DIR directive C concurrentized Loop Summary From To Loop Loop Loop# line line label index Status 1 6 8 Do I concurrentized |
Be aware that the round-off error produced by the parallel reduction operation is not necessarily any worse than the round-off error already present in the original serial version. It will simply be different. If your application did not worry about the round-off error in the original, there is no reason to suppose that it should worry about it in the parallel version. If, on the other hand, your application takes special steps to reduce round off (for example, adding the numbers together in order from smallest absolute value to largest), then you should not use parallel reductions.
The previous example is called a sum reduction because the reduction operator is +. Table 3-5 shows the types of reductions PFA supports.
Type | Operator | Example |
---|---|---|
Sum | + | sum = sum + expression |
Product | * | p = p* expression |
Min | min( ) | a = min(a, expression) |
Max | max( ) | x = max(x, expression) |
All these reductions are under the control of the -ROUNDOFF command line option, even though technically the min and max reductions do not involve round-off problems.