Chapter 5. Fine-Tuning PFA

This chapter contains the following sections:

Overview

After you run a Fortran source program through PFA once, you can use directives and assertions to fine-tune program execution. The listing file will show where and why PFA did not parallelize the code.

You can use directives and assertions to force PFA to execute portions of code in various ways. Command line directives apply to the program as a whole.

If you want finer control for parallelizing a critical loop or inlining a particular occurrence of a routine, specify directives and assertions directly in the code. You can also use directives and assertions to keep PFA from converting code to run in parallel. In other cases you might want to explicitly force PFA to run segments of code in parallel even though it normally would not.

Fine-Tuning Inlining and IPA

Chapter 4, “Customizing PFA Execution,” explains how to use inlining and IPA on an entire program (refer to “Performing Inlining and Interprocedural Analysis”). You can fine-tune inlining and IPA using the C*$*[NO] INLINE and C*$*[NO] IPA directives.

The C*$* [NO] INLINE directive behaves much the same as the -INLINE command line option, but with the directive you can specify which occurrences of a routine are actually inlined. The format for this directive is

C*$*[NO]INLINE [(name[,name ... ])] {HERE|ROUTINE|GLOBAL}

where

name 

Specifies the routines to be inlined. If you do not specify a name, this directive will affect all routines in the program.

HERE 

Applies the INLINE directive only to the next line; occurrences of the named routines on that next line are inlined.

ROUTINE 

Inlines the named routines everywhere they appear in the current routine.

GLOBAL 

Inlines the named routines throughout the source file.

The C*$*NOINLINE form overrides the -INLINE command line option and so allows you to selectively disable inlining of the named routines at specific points.

Example

In the following code fragment, the C*$*INLINE directive inlines the first call to beta but not the second.

       do i =1,n
C*$*INLINE (beta) HERE
          call beta (i,1)
       enddo
       call beta (n, 2)

Using the specifier ROUTINE rather than HERE inlines both calls. This routine must be compiled with the -inline_man command line option for the C*$* INLINE directive to be recognized.

The C*$* [NO] IPA directive is the analogous directive for interprocedural analysis. The format for this directive is

C*$*[NO]IPA [(name [,name...])]  {HERE|ROUTINE|GLOBAL}

Circumventing PFA

Sometimes you might need to hand-tune a DO loop so that it will run in parallel. Use the directives in this section to prevent PFA from analyzing your modified code.

C$ DOACROSS

The C$ DOACROSS directive tells the Fortran 77 compiler to generate parallel code for the following loop. When PFA encounters this directive on input, it does not modify the accompanying loop and therefore does not interfere with any hand-tuning.

C$ DOACROSS is the standard method for parallelism in Fortran. This directive is the same directive that PFA generates as a result of its analysis. Refer to the Fortran 77 Programmer's Guide for more information about the
C$ DOACROSS directive and its optional clauses.

PFA runs the following code as it appears:

C$ DOACROSS
      DO 10 I=1, 100
         A(I) = B(I)
10    CONTINUE

C$&

The C$& directive continues the C$ DOACROSS directive onto multiple lines, for example,

C$DOACROSS SHARE(ALPHA, BETA, GAMMA, DELTA,
C$&   EPSILON, OMEGA), LASTLOCAL (I, J, K, L, M, N),
C$&   LOCAL(XXX1, XXX2, XXX3, XXX4, XXX5, XXX6, XXX7,
C$&   XXX8, XXX9)

Running Code Serially

Use the following assertions and directives to keep PFA from running specific code in parallel.

C*$* ASSERT DO (SERIAL)

The C*$* ASSERT DO (SERIAL) assertion tells PFA to run the specified loop serially. PFA does not try to convert the specified loop to run in parallel. It also does not try to run any enclosing loop in parallel. However, PFA can still convert any loops nested inside the serial loop to run in parallel.

CDIR$ NEXT SCALAR

Silicon Graphics PFA supports the corresponding Cray directive, CDIR$ NEXT SCALAR. PFA interprets this directive as if it were a C*$* ASSERT DO (SERIAL) assertion and generates scalar code for the next DO loop.

C*$* ASSERT DO PREFER (SERIAL)

The C*$* ASSERT DO PREFER (SERIAL) assertion indicates that you want to execute a DO loop in serial mode. This assertion directs PFA to leave the DO loop alone, regardless of the setting of the optimization level. You can use this assertion to control which loop (in a nest of loops) PFA chooses to run in parallel. The following example program segment shows how to use the assertion:

          DO 100  I = 1,  N 
C*$*ASSERT DO PREFER (SERIAL)
          DO 100  J = 1,  M 
            A(I,J) = B(I,J)
100       CONTINUE

In the DO loop above, the assertion requests that the J loop be serial. In this construction, PFA tries to run the I loop in parallel but not the J loop. This capability is useful when you know the value of M to be very small or less than N. This assertion applies only to the DO loop that appears directly after the assertion.

Running Code in Parallel

This section explains the directives and assertions that allow PFA to determine that specific areas of code are safe to run in parallel.

C*$*[NO]CONCURRENTIZE

The C*$*[NO]CONCURRENTIZE directive converts eligible loops to run in parallel. The NO version prevents PFA from converting loops to run in parallel. These directives, when specified globally, have the same effect as the -CONCURRENTIZE and -NOCONCURRENTIZE options (see Chapter 2, “How to Use PFA.”).

CVD$ CONCUR

PFA supports the VAST directive CVD$CONCUR. This directive runs a loop in parallel to optimize performance. PFA interprets this directive as if it were the C*$*CONCURRENTIZE directive.

C*$* ASSERT DO PREFER (CONCURRENT)

The C*$* ASSERT DO PREFER (CONCURRENT) assertion directs PFA to run a particular nested loop in parallel if possible. PFA runs another of the nested loops in parallel only if a condition prevents running the selected loop in parallel.

Consider the following code:

C*$* ASSERT DO PREFER (CONCURRENT)
          DO 100 I = 1, N
          DO 100 J = 1, M
             A (I, J) = B (I, J)
100       CONTINUE

This code directs PFA to prefer to run the I loop in parallel. However, if a data dependence conflict prevents running the I loop in parallel, PFA might run the J loop in parallel. The C*$* ASSERT DO PREFER (CONCURRENT) assertion applies only to the DO loop immediately before it.

Ignoring Data Dependencies

PFA avoids running code in parallel that it believes to be data-dependent. Use the assertions described in the following sections to override this behavior.

C*$* ASSERT DO (CONCURRENT)

The C*$* ASSERT DO (CONCURRENT) assertion tells PFA to ignore assumed data dependencies. Normally, PFA is conservative about converting loops to run in parallel.

When PFA analyzes a loop to see if it is safe to run in parallel, it categorizes the loop into one of three groups:

  • yes (loop is safe to run in parallel)

  • no

  • not sure

Normally, PFA does not run “not sure” loops in parallel. It assumes there are data dependencies. C*$* ASSERT DO (CONCURRENT) tells PFA to go ahead and run “not sure” loops in parallel.


Note: If PFA identifies a loop as containing definite (as opposed to assumed) data dependencies, it does not run the loop in parallel even if you specify a C*$* ASSERT DO (CONCURRENT) assertion.


CDIR$ IVDEP

PFA interprets the Cray directive CDIR$ IVDEP as if it were a C*$* ASSERT DO (CONCURRENT) assertion. Some dependencies that are safe to run on Cray hardware are not safe to run on SGI hardware. Therefore, recognition of this assertion is turned off by default.

C*$* ASSERT CONCURRENT CALL

The C*$* ASSERT CONCURRENT CALL tells PFA to ignore assumed dependencies that are due to a subroutine call or a function reference. However, you must ensure that the subroutines and referenced functions are safe for parallel execution. This assertion applies to all subroutine and function references in the accompanying loop, which must appear on the next line.

C*$* ASSERT NO RECURRENCE

The C*$* ASSERT NO RECURRENCE(variable) assertion tells PFA to ignore all data dependencies associated with variable. PFA ignores not just assumed dependencies (as with the C*$* ASSERT DO (CONCURRENT) assertion) but also real dependencies. Use this assertion to force PFA to parallelize a loop when other, gentler means have failed. Use this assertion with caution, as indiscriminate use can result in illegal parallel code.

C*$* ASSERT PERMUTATION

The C*$* ASSERT PERMUTATION(array) assertion tells PFA that array contains no repeated values. This assertion permits PFA to run in parallel certain kinds of loops that use indirect addressing, for example,

DO I = 1, N
   A(INDEX(I)) = A(INDEX(I)) + B(I)
ENDDO

You can run this loop in parallel only if the array INDEX has no repeated values (so that each INDEX (I) is unique). PFA cannot determine this, so it does not run such a loop in parallel. However, if you know that every element of INDEX() is unique, you can insert the following line before the loop to permit PFA to run the loop in parallel:

C*$* ASSERT PERMUTATION (INDEX)

Using Equivalenced Variables

The C*$* ASSERT NO EQUIVALENCE HAZARD assertion tells PFA that your code does not use equivalenced variables to refer to the same memory location inside one loop nest. Normally, EQUIVALENCE statements allow your code to use different variable names to refer to the same storage location. The -ASSUME=E command line option acts like the global C*$* ASSERT EQUIVALENCE HAZARD assertion (see “Global Assumptions” in Chapter 4). The C*$* ASSERT EQUIVALENCE HAZARD assertion is active until you reset it or until the end of the program unit.

Using Aliasing

PFA has several assertions for use with aliasing.

C*$* ASSERT [NO] ARGUMENT ALIASING

The C*$* ASSERT [NO] ARGUMENT ALIASING assertion allows PFA to make assumptions about subprogram arguments in a program. According to the Fortran 77 standard, you can alias a variable only if you do not modify (that is, write to) the aliased variable.

The following subroutine violates the standard, because variable A is aliased in the subroutine (through C and D) and variable X is aliased (through X and E):

COMMON X,Y
REAL A,B
CALL SUB (A, A, X)
...
SUBROUTINE SUB(C,D,E)
COMMON X,Y
X =  ...
C =  ...
... 

The command line option -ASSUME=P acts like a global C*$* ASSERT ARGUMENT ALIASING assertion (see Chapter 4, “Customizing PFA Execution.”). A C*$* ARGUMENT ALIASING assertion is active until it is reset or until the next routine begins.

C*$* ASSERT RELATION

The C*$* ASSERT RELATION(name.xx.name) assertion indicates the relationship between two variables or between a variable and a constant. name is the variable or constant, and xx is any of the following: GT, GE, EQ, NE, LT, or LE. This assertion applies only to the next DO statement.

Consider the following code:

          DO 100 I = 1, N
             A (I) = A (I+M) + B (I)
100       CONTINUE

If you know that M is greater than N, use the following assertion to give this information to PFA:

C*$* ASSERT RELATION (M .GT. N) 
          DO 100 I = 1, N
             A (I) = A (I +M) + B (I)
100       CONTINUE

Knowing that M is greater than N, PFA can generate parallel code for this loop. If at run time, M is less than N, the answers produced by the code run in parallel could differ significantly from the answers produced by the original code run serially.


Note: Many relationships of this type can be cheaply tested for at run time. PFA will attempt to answer questions of this sort by generating an IF statement that explicitly tests the relationship at run time. Occasionally, PFA may need assistance, or you may want to squeeze that last ounce of performance out of some critical loop by asserting some relationship rather than repeatedly checking it at run time.