Chapter 3. Utilizing PFA Output

This chapter contains the following sections:


PFA generates two files, a listing file (.l) and an intermediate file (.m). Invoking PFA as part of a Fortran compilation produces a line-numbered listing file when you use the -pfa list option. If you specify the -keep option, PFA produces both the numbered listing file and the intermediate file. PFA automatically produces both files when you invoke it directly. (For details about invoking PFA, refer to Chapter 2, “How to Use PFA.”)

For example, consider the following program, sample.f:

   subroutine sample (a,b,c)
   dimension a(1000),b(1000),c(1000)
   do 10 i  = 1, 1000
10 a(i)  = b(i) + c(i)

Compiling sample.f as follows

% f77 -pfa keep sample.f

generates the following listing file, sample.l:

Actions   Do Loops Line
DIR                  1 # 1  “sample.f”
                     2       subroutine sample(a,b,c)
                     3       dimension a(1000),b(1000),c(1000)
c         +--------  4       do 10 i = 1,1000
          *_______   5  10   a (i) = b(i) + c(i)
                     6       end
Abbreviations Used
   DIR    directive
   C      concurrentized
Loop Summary

       From  To    Loop     Loop
Loop#  line  line  label    index   Status
1      4     5     DO 10    I       concurrentized

and the intermediate file, sample.m:

#  1  “sample.f”
#  1  “sample.f”
            subroutine sample(a,b,c)
            DIMENSION A(1000), B(1000), C(1000)
#  3  “sample.f”
#  3  “sample.f”
            DO 2 I=1,000
#  4  “sample.f”
            A(I) = B(I) + C(I)
#  4  “sample.f”
            2 CONTINUE

PFA placed a C before the first statement of the DO loop in the listing file, sample.l . The Abbreviations Used table shows that C stands for “concurrentized,” which means that PFA determined that it can safely run the loop in parallel. The Loop Summary table at the bottom of sample.l shows that the status of the loop is concurrentized.

PFA inserted the statement starting with C$DOACROSS before the DO statement in the intermediate file, sample.m. The Fortran 77 compiler directive C$DOACROSS tells f77 that the next DO loop can run in parallel. The phrase SHARE (A,B,C) informs the Fortran 77 compiler that all processes that execute the DO loop share the arrays A, B, and C. The phrase LOCAL(I) indicates that every process executing the DO loop keeps a local variable I. The lines of the form # 4  "sample.f" are called line number directives. They relate the transformed source back to the original source.

Note: The first line number directive appears in the listing because it was actually added by cpp before PFA ran.

Formatting the Listing File

You customize a PFA listing file by

  • paginating the listing

  • selecting the information to be printed

  • disabling specific message classes

Paginating the Listing

The -LINES= n option (or -LN= n) paginates the listing for printing. Use this to change the number of lines per page. Specifying -LINES=0 paginates at subroutine boundaries.

If you do not specify the -LINES option, PFA prints 55 lines per page.

Specifying Information to Include

The -LISTOPTIONS= list option (or -LO= list) specifies the information to include in the listing file (.l), where list is any combination of the options in Table 3-1.

Table 3-1. Listing File Include Options




Calling tree at the end of the program listing.


Transformed program file annotated with line numbers in the source program. Error messages and debugging information can refer to the original source rather than the transformed source. Running PFA as part of an f77 compile automatically adds this option.


Print out of the PFA options used at the end of each program unit.


Loop-by-loop optimization table.


Program unit names, as processed, to the standard error file. This option is added automatically as part of an f77 -v compilation.


Annotated listing of the original program.


Processing performance statistics.


Summary of optimization performed.


Annotated listing of the transformed program.

Disabling Message Classes

Use the -SUPPRESS= list option (or -su= list) to disable individual classes of PFA messages that are normally included in the listing (.l) file. These messages range from syntax warnings and error messages to messages about the optimizations performed. list is any combination of the options in Table 3-2.

Table 3-2. Listing File Message Disabling Options


Message Class Disabled


Data dependence


Syntax error




Unable to run loop in parallel




Standard messages


Warning of syntax error (PFA adds the -SUPPRESS=W option automatically if you use the -w option to f77)

If you do not specify this option, PFA prints messages of all classes.

Interpreting Default Listing Information

Knowing when and where to modify your code means understanding the information in the PFA listing. This understanding allows you to recognize where small changes to the source code will make a big difference in how much code is run in parallel.The PFA-generated listing file lists the optimizations PFA made to the code. For example, a message could say that, although three loops could have run in parallel, PFA converted only the one it determined most profitable.

This section explains how to view the listing file online and then lists and describes the various fields.

Viewing the Listing File

The listing file is in 132-column format. To view the file, open a window with 132 columns and 40 rows by entering

% wsh -s132,40

Field Descriptions

This section explains the contents of the .l file when you use the default values for the -LISTOPTIONS command line option (that is, O and L).

A default PFA file listing includes

  • line numbers

  • DO loop markings

  • footnotes

  • syntax errors/warning messages

  • action summary

Line Numbers

A statement in the PFA listing labeled with a line number, such as 21, is the same as line 21 from the original program or has been derived from that line. These line numbers are useful when inspecting the PFA-transformed program listing and when debugging. PFA sometimes generates several lines of code from a single line of the original program; in this case, each new line of code is labeled with the same number as the line of the original program from which it was generated. Consequently, many lines of the PFA-transformed program listing carry the same number because they are related to one line of the original program listing.

DO Loop Marking

The listing file displays DO loops graphically in a column headed DO Loops. The PFA surrounds each DO loop (up to nest level 10) with a loop delimiter character. Each character listed in Table 3-3 has a specific meaning.

Table 3-3. Listing File DO Loop Delimiters




Generic DO loop


PFA can run loop in parallel


Syntax error

A statement contained within n DO loops has n of these loop delimiters on that line.

For example,

DO Loops  Line
+-------  173      DO 100 M=2,MAX(MFLD,2)
|         174      IADR = ISECT(M)
|         175      IADR1= ISECT(M-1)
|         176      PNM(IADR)=(ANM(IADR) *PNM(IADR1))
|_______  177 100  PPNM(IADR)= -(ANM(IADR) *PNM(IADR1))


PFA uses the footnotes listing to give important details concerning its actions. PFA numbers and prints the footnotes at the bottom of each program unit under the Footnote List heading. References to the footnotes are displayed in the listing under the Footnotes column. For example, this footnote

13 DD   1790  IF (B(I) .LE. 6) IB(J*I) = I+J

appears under Footnote List at the end of the program unit

13: data dependence    Data dependence involving this line due
                       to variable IB.

In this example, 13 is the footnote number, DD (data dependence) is the explanation for PFA's action, and the IF statement on line 1790 refers to the original source line number.

Syntax Errors/Warning Messages

When a program has syntax errors, the listing file describes the error next to the lines that start with the symbol ### in the Footnotes column. These messages are also printed to stderr, which will usually be your terminal.

For example,

Footnotes Actions   DO Loops   Line
                                 1      SUBROUTINE Z(A,B,N)
                                 2      REAL A(N), B(N)
                    +-------     3      DO 20 I=1,N
                    !            4      X=A(I)
                    !            5      Y=B(I)
                    ! ______     6   20 C(I)=X+Y
### line (6) 
### error    Array not declared or statement function declared 
             after executable statements.
### error    A do loop ends on a non-executable statement.
                                 7      PRINT *,X
                                 8      END

Action Summary

When PFA translates or modifies a statement, it uses abbreviations in the Actions column of the listing file to identify the statements. PFA lists an abbreviated explanation of its actions at the bottom of the listing. For the DIR and V classes, the class itself serves as the message and no detailed messages follow. All other classes have associated messages.

Table 3-4 lists and explains the values that can appear in the Actions column.

Table 3-4. PFA Action Abbreviations




(Data Dependence) Indicates that data dependence prevented PFA from running this statement in parallel.


(Directive) Used in conjunction with the footnotes and concerns compiler directives. If you code a compiler directive and that line does not have the DIR abbreviation in the listing, PFA will not recognize the directive. Check the setting of the -DIRECTIVES command line option and the syntax of the directive.


(Error) Indicates syntax errors. These messages can refer to missing or extra characters, illegal keywords, or text placed in the wrong column. PFA cannot do anything with such code. The intermediate (.m) file contains a copy of this program unit that PFA has not modified.


(Extension) Shows where a construct in the original program is not allowed in the language PFA produces. In some cases, an operation or type is allowed in the input language but not in the output language.


(Information) Provides noncritical information.


(Insertion) Indicates that PFA added a statement.


(Loop Reordering) Indicates that PFA has modified a Fortran 77 statement in the process of interchanging loops. If during optimization PFA ascertains that an outer loop would be more efficient as an inner loop, and it can legally reorder the loops, PFA places the outer loop inside. In the process of this reordering, PFA might have to change loop bounds (for triangular loops), distribute loops, or float IF assignments. Only the statements modified for the exchange are marked.


(Miscellaneous) Indicates that some PFA information has been lost. This message does not always mean that something is wrong with the program.


(Nonconcurrent Statement) Indicates that PFA did not try or was unable to run the statement in parallel. For example, when a subroutine call is involved in a loop, PFA generates this message.


(Program Too Large—Not Optimized) Indicates that the program unit being processed is too large for PFA to optimize, because of PFA's data structure size limitations. When PFA optimizes programs, it adds statements that might also overflow the fixed-size tables. In either case, PFA stops optimization and passes the original program to the intermediate (.m ) file, informing you of this action. For PFA to process the unit, you must split the program into smaller sections.


(Option Error) Indicates a syntax error in a PFA option. This error does not stop processing of a program unit.


(Output Translation Failure) Marks statements that have constructs that exist in the input language but that cannot be represented in the output language.


(Question) Indicates that PFA tried to optimize a loop nest but discovered a data dependence it could not break at compile time without further information. You can usually answer this question with an appropriate assertion.


(Scalar Optimization) Marks places in the transformed listing where PFA has optimized a scalar loop.


(Standardized) Marks where PFA changed a program to improve the chance of finding code that it can optimize. This is often a conversion from an IF/GOTO into a block IF, loop rerolling, and conversion of an IF loop to a DO loop.


(Translator Error) Indicates an internal PFA error. PFA writes the notification to the standard error file and writes a trace back to the output file. Notify SGI if you see this sort of bug (so it can be corrected) and, if possible, send SGI the code that caused the trace back as well as the trace back itself. If you can reproduce the error in a small program unit, send that small program unit as well.


(Warning) Contains syntax warnings.

Sample Listing Files

This section contains a few simple examples of Fortran code and the corresponding PFA output. An actual source program would be much larger, and a single loop could contain several of the cases illustrated here. However, even in a large loop, you can deal with each problem individually.

Indirect Indexing

PFA cannot determine if it can run a loop in parallel when the code uses indirect indexing. A loop is indirectly indexed when it uses the value from some auxiliary array as the index value rather than the DO loop variable.

The Fortran 77 code

subroutine foo2(w,b,index,n)
real w(n), b(n)
integer index(n)
do i = 1, n
w(index(i)) = w(index(i)) + b(i)

when submitted to PFA, results in the listing file

                      12    subroutine foo2(w,b,index,n)
                      13    real w(n), b(n)
                      14    integer index(n)
1 Q       +-------    16    do i = 1, n
2 DD      !           17       w(index(i)) = w(index(i)) + b(i)
          !_______    18    enddo
                      19    end
Abbreviations Used
DD   data dependence
Q   question
Footnote List
1: question           Is INDEX a permutation vector?
2: data dependence    Data dependence involving this line due
                      to variable W.
DO Loop Summary
loop# from   to   DO label index   workload status
1     16     18   DO       I       dependencies prevent

DD in the Actions column on line 17 of the listing warns that the variable w might carry a dependency. A dependency exists when one iteration of the loop writes to a location that is used by a different iteration of the loop. In this example, if the values of index(i) are ever the same for different values of i, then different iterations might use the same location in w. Therefore, this code contains a possible data dependence.

If you can guarantee that the values of index(i) are always different for each value of i, then there is no dependence (each iteration uses a different location in w). Question one on the Footnote List asks if index(i) is different for every value of i. A permutation vector is a list of numbers, each of which is different from the others. If you know that index is a permutation vector, then the loop is data-independent. An example of a permutation vector is a list of objects in which each object appears exactly once.

Explicitly state that index is a permutation vector by adding an assertion in the source

subroutine foo2(a,b,index,n)
real a(n), b(n)
integer index(n)
c*$*assert permutation (index)
do i = 1, n
a(index(i)) = a(index(i)) + b(i)

Now the listing file shows that PFA finds the loop safe to run in parallel (indicated by the * DO loop delimiter)

Actions  DO Loops   Line
DIR                  1  #  1 “foo2.f”
                     2     subroutine foo2(a,b,index,n)
                     3     real a(n), b(n)
                     4     integer index(n)
DIR                  6 c*$*assert permutation (index)
C        +------     7     do i= 1, n
         *           8        a(index(i)) =  a(index(i)) + b(i)
         *______     9     enddo
                     10    end

Abbreviations Used

DIR   directive
C     concurrentized

Loop Summary

        From   To     Loop    Loop
Loop#   line   line   label   index   Status
1       7      9      Do      I       concurrentized

Note: As with all assertions, PFA does not verify the truth of this assertion. When you make an assertion, be certain that the assertion is always true for all possible input data.

Function Call

This example shows what happens when a loop contains a call to an external routine. The Fortran 77 code

subroutine foo3 (a,b,c,n)
real a(n), b(n), c(n)
external force
do i = 1, n
a(i) = force (b(i), c(i))

generates the listing

Actions DO Loops     Line
DIR                  1 #  1 “foo3.f”
                     2    subroutine foo3(a,b,c,n)
                     3    real a(n), b(n), c(n)
                     4    external force
NCS     +------      6    do i = 1, n
NO NCS  !            7       a(i) = force(b(i), c(i))
        !______      8    enddo
                     9    end
Abbreviations Used
NO   not optimized
DIR   directive
NCS   non-concurrent-stmt
Footnote List
1: not optimized       No optimizable statements found.
2: not optimized       Unoptimizable call to “FORCE” found.
Loop Summary
        From   To     Loop    Loop
Loop#   line   line   label   index   Status
1       6      8      Do      I       unoptimizable
                                      call (FORCE)

Calling the function force prevents PFA from automatically running the loop in parallel. PFA identifies the function call as a non-concurrent-stmt. By its nature, a nonconcurrent statement prevents PFA from assuming the loop is safe to run in parallel because PFA cannot see into the routine to look for data dependencies.

If you know that force generates no data dependencies, then explicitly state this fact for the nonconcurrent statement

subroutine foo3(a,b,c,n)
real a(n), b(n), c(n)
external force
c*$*assert concurrent call
do i = 1, n
a(i) = force(b(i), c(i))

Now that PFA knows that the nonconcurrent statement involves no data dependency, PFA will find the loop safe to run in parallel.

There is one subtlety in using the concurrent call assertion. When you use this assertion, PFA makes no attempt to examine the called routine; it simply assumes that it is safe. However, PFA is still left with the problem of correctly declaring the variables in the loop to be either SHARE or LOCAL. (PFA does the best it can, but it can sometimes be fooled.) For example,

subroutine tricky (a,b,c,n,m)
real a(*), b(*)
external my_function
c*$*assert concurrent call
do i = 1, n
a(i) = my_function (b(i), m)
b(i) = a(i) + m
m = 0

The question is whether the variable m should be SHARE or LOCAL. If the routine my_function only reads the old value of m, then it should be SHARE. If my_function writes a new value of m, then it should be LOCAL. In the absence of any more clues, PFA must go by what it can see; and what it can see is that within the loop, there are no visible assignments to m, and so PFA will declare it to be SHARE. If in fact my_function is writing the value of m, then this is incorrect. In this case, to give PFA the hint it needs, add a visible assignment to m at the top of the loop.

For example, consider the following code:

do i = 1, n
   m = 0
   a(i) = my_function(b(i), m)
b(i) = a(i) + m

Here, PFA can see an assignment to m and so will declare it to be LOCAL. Note that if my_function is both reading the old value and writing a new value of m, then it was not legal to parallelize the loop.


This example shows how PFA produces a single value from a set of values. Because the entire set of values is reduced to a single value, these operations are called reductions.

Consider the Fortran 77 code

subroutine foo4(a,b,n,sum)
real a(n), b(n), sum
sum = 0.0
do i = 1, n
sum = sum + a(i)*b(i)

Using the previous code as input, PFA produces the listing file

DIR             1 # 1 “foo4.f”
                2      subroutine foo4(a,b,n,sum)
                3      real a (n), b(n), sum
                5      sum = 0.0
      +-----    6      do i = i, n
1 DD  !         7         sum = sum + a(i)*b(i)
      !_____    8      enddo
                9      end
Abbreviations Used
DD        data dependence
DIR       directive
Footnote List
1: data dependence       Data dependence involving this
                         line due to variable “SUM”.
Loop Summary
       From  To    Loop   Loop
Loop#  line  line  label  index   Status
1      6     8     Do     I       scalar mode preferable

Because different iterations of the loop read and write the same location (the variable sum), there is a dependence. However, this is a special case. Because sum just accumulates a total, you can accumulate subtotals in parallel and then combine the subtotals at the end.

Because the parallel version of the code adds the elements together in a different order than the single-process version, the round-off errors accumulate differently for the two versions of the code. Thus, the answer can differ slightly as you vary the number of processes used to run the code. In fact, if you use the dynamic scheduling option for the code, the answer might vary slightly from one run of the program to the next, even if you use the same number of processes on the same machine.

Most applications can safely ignore this variation in round-off error. If you do not care about this round-off error, you can tell PFA to use parallel subtotals. To tell PFA not to worry about round-off error, you can use either the C*$*ROUNDOFF=2 directive or the f77/pfa command line option -WK , -roundoff=2.

The resulting listing file is

DIR              1 # 1 “foo4.f”
                 2      subroutine foo4(a,b,n,sum)
                 3      real a(n), b(n), sum
                 5      sum = 0.0
C      +------   6      do i = 1, n
       *         7         sum = sum + a(i)*b(i)
       *______   8      enddo
                 9      end
Abbreviations Used
DIR       directive
C         concurrentized
Loop Summary
       From  To    Loop   Loop
Loop#  line  line  label  index  Status
1      6     8     Do     I      concurrentized

Be aware that the round-off error produced by the parallel reduction operation is not necessarily any worse than the round-off error already present in the original serial version. It will simply be different. If your application did not worry about the round-off error in the original, there is no reason to suppose that it should worry about it in the parallel version. If, on the other hand, your application takes special steps to reduce round off (for example, adding the numbers together in order from smallest absolute value to largest), then you should not use parallel reductions.

The previous example is called a sum reduction because the reduction operator is +. Table 3-5 shows the types of reductions PFA supports.

Table 3-5. Reduction Types






sum = sum + expression



p = p* expression


min( )

a = min(a, expression)


max( )

x = max(x, expression)

All these reductions are under the control of the -ROUNDOFF command line option, even though technically the min and max reductions do not involve round-off problems.