Chapter 2. How to Use PFA

This chapter contains the following sections:

Overview

Simply running a program through PFA might buy you some improved performance, but you can get far more if you understand the PFA listing. From the listing, you can often identify small problems that prevent a loop from running safely in parallel. With a relatively small amount of work, you can remove these data dependencies and dramatically improve the program's performance.

When trying to find loops to run in parallel, focus your efforts on the areas of the code that use the bulk of the run time. Spending time trying to run in parallel a routine that uses only 1 percent of the run time of the program cannot significantly improve the performance of your program.

To determine where your code spends its time, take an execution profile of the program. Use either pc-sample profiling (through the -p option to f77(1)) or basic block profiling (through pixie(1)). Refer to Chapter 2, “Improving Program Performance,” of the IRIS-4D Compiler Guide for details about profiling.

There are two schools of thought about profiling: conservative and optimistic. The conservative approach takes a profile of the original (nonparallel) job. You then run in parallel only the loops that account for most of the run time. The more optimistic approach runs the entire program through PFA and then profiles the resulting multiprocessed job. The conservative approach reduces the chances that something might go wrong because it makes fewer changes to the code. It also focuses on the smallest number of lines of code that have the greatest effect.

Use the optimistic approach when you think that PFA will do a good job with the existing program. You will save time by letting PFA do what it can. You can then focus on those routines where PFA had a problem. One situation in which PFA frequently does a good job is when you convert programs that already run well on traditional vector architectures. Many such programs run in parallel without additional effort.

Whichever approach you choose, use the profile to focus your efforts on the most time-consuming routines. Once you find a time-consuming routine, submit that routine alone to PFA. If the routine is in the middle of a large file, consider using fsplit(1) to isolate the individual routine. Compile the routine with the –pfa keep option, and examine the listing file. The PFA listing identifies the loops that PFA can and cannot run in parallel. For loops that cannot run in parallel, the PFA listing also tells you why it could not convert the loop for parallel execution.

Compiling Programs With PFA

The following is the command line syntax for compiling a Fortran 77 program with PFA and command line options. You can pass these options to PFA by adding the –WK option to the f77 command line. It invokes the various processing phases that compile, optimize, assemble, and link edit the program. For more information about the –WK option, see the f77(1) manual page.

Syntax

f77 -pfa[{list|keep}][-WK,-option[=value][,-option[=value]]...] [-pfaprepass,-option[=value][,-option[=value]] ... ] filename.f

where

pfa 

Invokes the POWER Fortran Accelerator, pfa. Enables any multiprocessing directives.

list 

Runs pfa and generates an annotated listing of the parts of the program that can (and cannot) run in parallel on multiple processors. The listing file has the suffix .l.

keep 

Runs pfa, generates the listing file (.l), and saves the intermediate transformed Fortran 77 program. The intermediate file has the suffix .m.

–WK 

Passes the specified command line options to PFA. Do not enter spaces between -WK and any of the hyphens, options, equal signs, and values that follow it.

option 

Specifies a PFA command line option listed in Table 2-1, for example, -IGNOREOPTIONS.

value 

Specifies a value for a command line option, for example, 10.

–pfaprepass 

Passes the code through PFA an extra time. The first time through (the prepass), PFA uses the options specified in the –pfaprepass option but does not insert C$ DOACROSS directives. The output of this operation is then passed back through PFA, using the options specified in the –WK option. Only rarely should you need to use this option, and there is good reason to avoid it. Normally, PFA does all it can in a single run-through. In rare circumstances an extra pass can be beneficial. However, the PFA algorithms do not necessarily converge, and multiple passes over the code can change it for the worse.The syntax of this option is the same as the -WK option.

filename.f 

Specifies the Fortran 77 source program. The filename must always use the .f suffix.

Table 2-1 lists the PFA command line options. Although the table lists the options in uppercase, you can specify them in lowercase as well.


Note: You can replace many of the PFA command line options listed in Table 2-1 with in-code directives. For information on these directives, see Chapter 5, “Fine-Tuning PFA,” and Appendix B, “PFA Directives.”


Table 2-1. PFA Command Line Options

Reference

Long Name

Short Name

Default Value

Parallelization

[NO]CONCURRENTIZE

MINCONCURRENT=n

[N]CONC

MC=n

CONCURRENTIZE

MINCONCURRENT=500

Optimization

ARCLIMIT

LIMIT=n

OPTIMIZE=n

ROUNDOFF=n

SCALAROPT=n

UNROLL=n

UNROLL2=n

ARCLM=n

LM=n

O=n

R=n

SO=n

UR=n

UR22=n

ARCLIMIT=5000

LIMIT=20000

OPTIMIZE=5

ROUNDOFF=0

SCALAROPT=3

UNROLL=4

UNROLL2=100

Fortran 77 Language

Control

ASSUME=list

[NO]DLINES

[NO]ONETRIP

SAVE=c

SCAN=n

SYNTAX=c

AS=list

[N]DL

[N]l

SV=c

SCAN=n

SY=c

ASSUME=EL

NODLINES

NOONETRIP

SAVE=A

SCAN=72

(option off)

Inlining and Interprocedural Analysis

INLINE[=list]

IPA[=names]

INLINE_CREATE=name

IPA_CREATE=name

INLINE_FROM_FILES=list

IPA_FROM_FILES=list

INLINE_FROM_LIBRARIES=l ist

IPA_FROM_LIBRARIES=list

INLINE_LOOP_LEVEL=n

IPA_LOOP_LEVEL=n

INLINE_MAN

IPA_MAN

INLINE_DEPTH

IN IPAINCR=name

IPACR=name

INFF=list

IPAFF=list

INFL=list

IPAFL=list

INLL=n

IPALL=n

INM

IPAM

IND

(option off)

(option off)

(option off)

(option off)

(option off)

(option off)

(option off)

(option off)

(INLL=10IPALL=10

(option off)

INLL=10 IPALL=10) IND=10

Directives

[NO]DIRECTIVES=list

[N]DR=list

DIRECTIVES=AKSV

I/O

INPUT=file.f

[NO]FORTRAN=file

[NO]LIST=file

file.f

[N]F=file

[N]L=file

file.f

F=file.m

L=file.l

Listing

LINES=n

LISTOPTIONS=list

SUPPRESS=list

LN=n

LO=list

SU=list

LINES=55

LISTOPTIONS=OL

(option off)

Obsolete

CREATE

LIBRARY=file

[NO]EXPAND=list

LIMIT2=n

CR

LIB=file

EX=list

LM2=n

(option off)

(option off)

(option off)

LM2=5000


Example

To compile the Fortran 77 program prog.f with PFA and the -UNROLL=8 option, enter

% f77 -pfa -WK,-UNROLL=8 prog.f

Figure 2-1 shows what happens when you compile a Fortran 77 program with PFA. The first pass invokes the macro preprocessor cpp to handle cpp directives. (For more information, see the cpp(1) manual page.) PFA then takes the cpp output and inserts code that runs data-independent loops in parallel. PFA can also generate a listing file (with the .l suffix) and an intermediate file (with the .m suffix). For details, refer to Chapter 3, “Utilizing PFA Output.”

Finally, the Fortran 77 compiler, f77, compiles the transformed PFA-generated file to produce an object file.

Figure 2-1. Compiling With PFA


Using PFA Directly

Although you normally run PFA as part of an f77 compile, the two instances when you should run PFA directly are

Running the pfa(1) command directly, using the following syntax, produces both the .m and the .l files.

Syntax

/usr/lib/pfa [-option [-option]...] filename.f

where

-option 

Specifies a PFA command line option listed in Table 2-1, for example, -INLINE.

filename.f 

Specifies the Fortran 77 source program. The filename must have the .f suffix.

Example

The following command runs PFA directly using the -unroll and -roundoff options:

% /usr/lib/pfa -ur=4 -r=2 sample.f