This chapter contains the following sections:
“Overview” describes how to prepare for using PFA.
“Compiling Programs With PFA” explains how to run PFA as part of a Fortran compile.
“Using PFA Directly”explains how to run PFA independent of the Fortran driver.
Simply running a program through PFA might buy you some improved performance, but you can get far more if you understand the PFA listing. From the listing, you can often identify small problems that prevent a loop from running safely in parallel. With a relatively small amount of work, you can remove these data dependencies and dramatically improve the program's performance.
When trying to find loops to run in parallel, focus your efforts on the areas of the code that use the bulk of the run time. Spending time trying to run in parallel a routine that uses only 1 percent of the run time of the program cannot significantly improve the performance of your program.
To determine where your code spends its time, take an execution profile of the program. Use either pc-sample profiling (through the -p option to f77(1)) or basic block profiling (through pixie(1)). Refer to Chapter 2, “Improving Program Performance,” of the IRIS-4D Compiler Guide for details about profiling.
There are two schools of thought about profiling: conservative and optimistic. The conservative approach takes a profile of the original (nonparallel) job. You then run in parallel only the loops that account for most of the run time. The more optimistic approach runs the entire program through PFA and then profiles the resulting multiprocessed job. The conservative approach reduces the chances that something might go wrong because it makes fewer changes to the code. It also focuses on the smallest number of lines of code that have the greatest effect.
Use the optimistic approach when you think that PFA will do a good job with the existing program. You will save time by letting PFA do what it can. You can then focus on those routines where PFA had a problem. One situation in which PFA frequently does a good job is when you convert programs that already run well on traditional vector architectures. Many such programs run in parallel without additional effort.
Whichever approach you choose, use the profile to focus your efforts on the most time-consuming routines. Once you find a time-consuming routine, submit that routine alone to PFA. If the routine is in the middle of a large file, consider using fsplit(1) to isolate the individual routine. Compile the routine with the –pfa keep option, and examine the listing file. The PFA listing identifies the loops that PFA can and cannot run in parallel. For loops that cannot run in parallel, the PFA listing also tells you why it could not convert the loop for parallel execution.
The following is the command line syntax for compiling a Fortran 77 program with PFA and command line options. You can pass these options to PFA by adding the –WK option to the f77 command line. It invokes the various processing phases that compile, optimize, assemble, and link edit the program. For more information about the –WK option, see the f77(1) manual page.
f77 -pfa[{list|keep}][-WK,-option[=value][,-option[=value]]...] [-pfaprepass,-option[=value][,-option[=value]] ... ] filename.f |
where
–pfa | Invokes the POWER Fortran Accelerator, pfa. Enables any multiprocessing directives. | |
list | Runs pfa and generates an annotated listing of the parts of the program that can (and cannot) run in parallel on multiple processors. The listing file has the suffix .l. | |
keep | Runs pfa, generates the listing file (.l), and saves the intermediate transformed Fortran 77 program. The intermediate file has the suffix .m. | |
–WK | Passes the specified command line options to PFA. Do not enter spaces between -WK and any of the hyphens, options, equal signs, and values that follow it. | |
–option | Specifies a PFA command line option listed in Table 2-1, for example, -IGNOREOPTIONS. | |
value | Specifies a value for a command line option, for example, 10. | |
–pfaprepass | Passes the code through PFA an extra time. The first time through (the prepass), PFA uses the options specified in the –pfaprepass option but does not insert C$ DOACROSS directives. The output of this operation is then passed back through PFA, using the options specified in the –WK option. Only rarely should you need to use this option, and there is good reason to avoid it. Normally, PFA does all it can in a single run-through. In rare circumstances an extra pass can be beneficial. However, the PFA algorithms do not necessarily converge, and multiple passes over the code can change it for the worse.The syntax of this option is the same as the -WK option. | |
filename.f | Specifies the Fortran 77 source program. The filename must always use the .f suffix. |
Table 2-1 lists the PFA command line options. Although the table lists the options in uppercase, you can specify them in lowercase as well.
![]() | Note: You can replace many of the PFA command line options listed in Table 2-1 with in-code directives. For information on these directives, see Chapter 5, “Fine-Tuning PFA,” and Appendix B, “PFA Directives.” |
Table 2-1. PFA Command Line Options
Reference | Long Name | Short Name | Default Value |
---|---|---|---|
Parallelization | [NO]CONCURRENTIZE MINCONCURRENT=n | [N]CONC MC=n | CONCURRENTIZE MINCONCURRENT=500 |
Optimization | ARCLIMIT LIMIT=n OPTIMIZE=n ROUNDOFF=n SCALAROPT=n UNROLL=n UNROLL2=n | ARCLM=n LM=n O=n R=n SO=n UR=n UR22=n | ARCLIMIT=5000 LIMIT=20000 OPTIMIZE=5 ROUNDOFF=0 SCALAROPT=3 UNROLL=4 UNROLL2=100 |
Fortran 77 Language Control | ASSUME=list [NO]DLINES [NO]ONETRIP SAVE=c SCAN=n SYNTAX=c | AS=list [N]DL [N]l SV=c SCAN=n SY=c | ASSUME=EL NODLINES NOONETRIP SAVE=A SCAN=72 (option off) |
Inlining and Interprocedural Analysis | INLINE[=list] IPA[=names] INLINE_CREATE=name IPA_CREATE=name INLINE_FROM_FILES=list IPA_FROM_FILES=list INLINE_FROM_LIBRARIES=l ist IPA_FROM_LIBRARIES=list INLINE_LOOP_LEVEL=n IPA_LOOP_LEVEL=n INLINE_MAN IPA_MAN INLINE_DEPTH | IN IPAINCR=name IPACR=name INFF=list IPAFF=list INFL=list IPAFL=list INLL=n IPALL=n INM IPAM IND | (option off) (option off) (option off) (option off) (option off) (option off) (option off) (option off) (INLL=10IPALL=10 (option off) INLL=10 IPALL=10) IND=10 |
Directives | [NO]DIRECTIVES=list | [N]DR=list | DIRECTIVES=AKSV |
I/O | INPUT=file.f [NO]FORTRAN=file [NO]LIST=file | file.f [N]F=file [N]L=file | file.f F=file.m L=file.l |
Listing | LINES=n LISTOPTIONS=list SUPPRESS=list | LN=n LO=list SU=list | LINES=55 LISTOPTIONS=OL (option off) |
Obsolete | CREATE LIBRARY=file [NO]EXPAND=list LIMIT2=n | CR LIB=file EX=list LM2=n | (option off) (option off) (option off) LM2=5000 |
To compile the Fortran 77 program prog.f with PFA and the -UNROLL=8 option, enter
% f77 -pfa -WK,-UNROLL=8 prog.f |
Figure 2-1 shows what happens when you compile a Fortran 77 program with PFA. The first pass invokes the macro preprocessor cpp to handle cpp directives. (For more information, see the cpp(1) manual page.) PFA then takes the cpp output and inserts code that runs data-independent loops in parallel. PFA can also generate a listing file (with the .l suffix) and an intermediate file (with the .m suffix). For details, refer to Chapter 3, “Utilizing PFA Output.”
Finally, the Fortran 77 compiler, f77, compiles the transformed PFA-generated file to produce an object file.
Although you normally run PFA as part of an f77 compile, the two instances when you should run PFA directly are
When creating an inlining or IPA library (refer to Chapter 4, “Customizing PFA Execution.”)
If you want to “capture” the output of PFA and review it to determine further optimizations
Running the pfa(1) command directly, using the following syntax, produces both the .m and the .l files.
/usr/lib/pfa [-option [-option]...] filename.f |
where
-option | Specifies a PFA command line option listed in Table 2-1, for example, -INLINE. | |
filename.f | Specifies the Fortran 77 source program. The filename must have the .f suffix. |