This appendix contains the following sections:
This appendix lists and describes the three types of PFA directives:
Standard
Cray
VAST
Chapter 1, “Overview of PFA,” describes the purpose of directives. For details about how to use directives, refer to Chapter 5, “Fine-Tuning PFA.”
This section lists and describes the following standard PFA directives alphabetically:
C*$*ARCLIMIT
C*$*CONCURRENTIZE
C*$*INLINE
C*$*IPA
C*$*LIMIT
C*$*MINCONCURRENT
C*$*NOCONCURRENTIZE
C*$*NOINLINE
C*$*NOIPA
C*$*OPTIMIZE
C*$*ROUNDOFF
C*$*SCALAR OPTIMIZE
C*$*UNROLL
C$*DOACROSS
C$&
The C*$*ARCLIMIT(n) directive controls the size of the internal table used to store data dependence information (arcs). n is an integer. This directive, when specified globally, has the same effect as the -ARCLIMIT command line option.
The C*$*CONCURRENTIZE directive converts eligible loops to run in parallel. This directive, when specified globally, has the same effect as the -C*$*CONCURRENTIZE command line option. See also C*$*NOCONCURRENTIZE.
The C*$*INLINE directive behaves much like the -INLINE command line option but specifies which occurrences of a routine are actually inlined. The format for this directive is
C*$* INLINE [(name[,name ... ] ) ] {HERE | ROUTINE | GLOBAL} |
where
name | Specifies the routines to be inlined. If you do not specify a name, all routines will be affected. | |
HERE | Inlines only to the next line; occurrences of the named routines on that next line are inlined. | |
ROUTINE | Inlines the named routines everywhere they appear in the current routine. | |
GLOBAL | Inlines the named routines throughout the source file. |
See also C*$*NOINLINE.
For details about inlining, refer to Chapter 4, “Customizing PFA Execution.” For details about using the C*$*INLINE directive, refer to Chapter 5, “Fine-Tuning PFA.”
The C*$* IPA directive behaves much like the -IPA command line option but specifies on which occurrences of a routine to use IPA. The format for this directive is
C* $ * IPA [ (name [, name ... ])] {HERE|ROUTINE|GLOBAL} |
where
name | Specifies the routines to be IPAed. If you do not specify a name, all routines will be affected. | |
HERE | Uses IPA only on occurrences of the named routines that appear on the next line. | |
ROUTINE | Uses IPA on the named routines everywhere they appear in the current routine | |
GLOBAL | Uses IPA on the named routines throughout the source file. |
See also C*$*NOIPA.
For details about interprocedural analysis, refer to Chapter 4, “Customizing PFA Execution.” For details about using the C*$*IPA directive, refer to Chapter 5, “Fine-Tuning PFA.”
The C*$*LIMIT(n) directive reduces PFA processing time by limiting the amount of time PFA can spend on trying to determine whether a loop is safe to run in parallel. PFA estimates how much time is required to analyze each loop nest construct. If an outer loop looks like it would take too much time to analyze, PFA ignores the outer loop and recursively visits the inner loops.
Larger limits often allow PFA to generate parallel code for deeply nested loop structures that it might not otherwise be able to run safely in parallel. However, with larger limits PFA can also take more time to analyze a program. (The limit does not correspond to the DO loop nest level. It is an estimate of the number of loop orderings that PFA can generate from a loop nest.)
This directive, when specified globally, has the same effect as the -LIMIT command line option.
The C*$*MINCONCURRENT( n) option establishes the minimum amount of work needed inside the loop to make executing a loop in parallel profitable. n is a count of the number of operations (for example, add, multiply, load, store) in the loop, multiplied by the number of times the loop will be executed. If the loop does not contain at least this much work, the loop will not be run in parallel. If the loop bounds are not constants, an IF clause will be automatically added to the PFA-generated C$ DOACROSS directive to test at run time if sufficient work exists.
The C*$*NONCONCURRENTIZE option prevents PFA from converting loops to run in parallel. See also C*$*CONCURRENTIZE.
The C*$*NOINLINE directive behaves much like the -NOINLINE command line option, but with the directive you can specify which occurrences of a routine are not inlined. The format for this directive is
C*$* NOINLINE [(name [,name ... ])] {HERE|ROUTINE|GLOBAL} |
where
name | Specifies the routines to be inlined. If you do not specify a name all routines will be affected. | |
HERE | Disables inlining of occurrences of the named routines only on the next line. | |
ROUTINE | Disables inlining of the named routines everywhere they appear in the current routine. | |
GLOBAL | Disables inlining of the named routines throughout the source file. |
C*$*NOINLINE overrides the -INLINE command line option and so allows you to disable inlining of the named routines at specific points.
The C*$*NOIPA directive behaves much like the -NOIPA command line option, but with the directive you can specify on which occurrences of a routine to not use IPA. The format for this directive is
C*$* NOIPA [(name [, name ... ])] { HERE|ROUTINE|GLOBAL} |
where
name | Specifies the routines to disable IPA. If you do not specify a name all routines will be affected. | |
HERE | Disables IPA of occurrences of the named routines only on the current routine | |
ROUTINE | Disables IPA of the named routines everywhere they appear in the current routine. | |
GLOBAL | Disables IPA of the named routines throughout the source file. |
C*$*NOIPA overrides the -IPA command line option and so allows you to disable IPA of the named routines at specific points.
The C*$*OPTIMIZE(n) directive sets the optimization level. The higher the optimization level, the more code is optimized and longer PFA runs. Valid values for n are the integers
0 | Avoids converting loops to run in parallel. | |
1 | Converts loops to run in parallel without using advanced data dependence tests. Enable loop interchanging. | |
2 | Determines when scalars need last-value assignment using lifetime analysis. Also uses more powerful data dependences tests to find loops that can run safely in parallel. This level allows reductions in loops that execute concurrently but only if the round-off setting is at least 2. | |
3 | Breaks data dependence cycles using special techniques and additional loop interchanging methods, such as interchanging triangular loops. This level also implements special-case data dependence tests. | |
4 | Generates two versions of a loop, if necessary, to break a data dependent arc. This level also implements more exact data dependence tests and allows special index sets (called wraparound variables) to convert more code to run in parallel. | |
5 | Fuses two adjacent loops if it is legal to do so (no data dependencies) and if the loops have the same control values. In certain limited cases, this level recognizes arrays as local variables. Level 5 also tells PFA to try harder to run the outermost loop possible (of a set of loops) in parallel. |
![]() | Note: If you want to use unrolling, set the optimize level to at least 4 (the default optimization level is above this threshold). |
The C*$*ROUNDOFF(n) directive controls whether PFA runs a reduction operation in parallel. Valid values for n are
0–1 | Suppresses any round-off changing transformations. | |
2 | Allows reductions to be performed in parallel. The valid reduction operators are addition, multiplication, min, and max. -ROUNDOFF=2 is one of the most common user options. | |
3 | Recognizes REAL induction variables. Permits the memory management transformations. |
The C*$*SCALAR OPTIMIZE (n) directive controls the amount of standard scalar optimizations attempted by PFA. Valid values for n are
0 | Performs no scalar transformations. | |
1 | Enables dead code elimination, pulling loop invariants, forward substitution, and conversion of IF-GOTO into IF-THEN-ELSE. | |
2 | Enables induction variables recognition, loop unrolling, loop fusion, array expansion, scalar promotion, and floating invariant IF tests. (Loop fusion also requires -OPTIMIZE=5.) | |
3 | Enables the memory management transformations. (Memory management also requires -ROUNDOFF=3.) |
The C*$*UNROLL (n) directive unrolls scalar inner loops when PFA cannot run the loops in parallel. When PFA unrolls a loop, it replicates the body of the loop a certain number of times, making the loop ran faster. In this form, n has the same meaning as in the -UNROLL=n command line option.
The C*$*UNROLL(n, m) option allows you to adjust the number of operations used when unrolling. In this form, n is as above and m is as in the -UNROLL2=m command line option.
This form of unrolling applies only to the innermost loops in a nest of loops. You can unroll loops whether they execute serially or concurrently.
The C$ DOACROSS directive tells the Fortran 77 compiler to generate parallel code for the loop that immediately follows the directive. Putting this directive in the original source marks the loop to run in parallel and signals PFA not to modify the loop.
![]() | Note: PFA generates the C$ DOACROSS directive and inserts it into the code as the result of PFA's parallelism analysis. |
PFA supports the following Cray directives:
CDIR$ IVDEP
CDIR$ NEXT SCALAR
PFA interprets the CDIR$ IVDEP directive as if it were a C*$* ASSERT DO (CONCURRENT) assertion. (Refer to Appendix C, “PFA Assertions,” for details.)
CDIR$ NEXT SCALAR is a Cray directive that generates scalar code for the next DO loop. PFA interprets this directive as if it were a C*$* ASSERT DO(SERIAL) assertion. (Refer to Appendix C, “PFA Assertions,” for details.)
PFA supports the CVD$CONCUR VAST directive. The CVD$CONCUR directive runs a loop in parallel to optimize performance. PFA interprets this directive as if it were the C*$*CONCURRENTIZE directive (described in “Standard Directives”).