This chapter contains the following sections:
“Overview of Customization” explains when to optimize Power Fortran execution.
“Controlling Code Execution” describes how to control whether Power Fortran runs eligible loops in parallel.
“Controlling Power Fortran Code Transformations” describes how to control the various transformations performed by Power Fortran.
“Performing Inlining and Interprocedural Analysis” describes inlining and interprocedural analysis and explains how and when to perform these procedures.
You can insert comment statements into a Power Fortran program to control whether it runs loops in parallel, how it limits complexity or round-off, and when it performs inlining or interprocedural analysis. These comment statements apply to only certain portions of source code.
To customize how Power Fortran executes an entire program, you can specify various command-line options when you run Power Fortran as described in Chapter 2, “How to Use Power Fortran.” For a complete summary of the Power Fortran command-line options, refer to Appendix A, “Power Fortran Command-Line Options.”
This chapter describes options that are recognized only by Power Fortran. For details about options for controlling scalar optimizations in pfa, refer to Chapter 5, “Scalar Optimizations.”
When modifying most programs to allow loops to run in parallel, modify the code so that Power Fortran can automatically run the loop in parallel. To avoid forcing the loop to run in parallel, directly insert a C$ DOACROSS directive. If you force code to run in parallel, you (and not Power Fortran) need to verify that no subsequent modification inserts data dependencies. Forcing these data dependencies in code to run in parallel can produce serious (and difficult-to-find) errors. Rewriting the loop so that Power Fortran recognizes the loop as safe to run in parallel allows Power Fortran to check future modifications for potential data dependencies.
This section describes how to control whether eligible loops are run in parallel and how to specify a work threshold for loops.
The –concurrentize option (or –conc) converts eligible loops to run in parallel. This is the default value for this option. The –noconcurrentize option (or –nconc) prevents Power Fortran from converting loops to run in parallel.
Loops requiring the addition of synchronization might run slower than the scalar original when concurrentized. In this case, you can specify the –noconcurrentize command-line option or the C*$* NO CONCURRENTIZE directive for a particular loop.
The –minconcurrent=n option (or –mc=n) specifies the minimum amount of work needed inside the loop to make executing a loop in parallel profitable. The positive integer n is a count of the number of operations (for example, add, multiply, load, store) in the loop, multiplied by the number of times the loop is executed. The higher the value for n, the larger (more iterations, more statements, or both) the loop body must be to be run in parallel.
If you do not specify this option, Power Fortran runs all loops containing 500 or more operations in parallel.
If the DO loop bounds are known at compilation time (that is, if they are constants), the compiler can compute the exact iteration count and decide whether to run the loop in parallel. If the DO loop bounds are unknown at compilation time, the compiler adds an IF clause to the C$ DOACROSS directive to test at run time if sufficient work exists. This is interpreted by the compiler as a request to generate two loops, one concurrentized and one left serial, and an IF-THEN-ELSE to make a run time check to decide whether to execute the loop in parallel. This case is called a two-version loop.
To disable the generation of two-version loops throughout the program, specify –minconcurrent=0; or to disable this action only in a few DO loops, specify the C*$* MINCONCURRENT(0) directive.
For example, given the original loop
DO 2 I =1,N X(I) = Y(I) * Z(I) 2 CONTINUE
Power Fortran generates the following transformed loop:
C$DOACROSS IF (N .GT. 100), SHARE (N,X,Y,Z), LOCAL(I) DO 3 I=1,N X(I) = Y(I)*Z(I) 3 CONTINUE
The IF clause ensures that n is large enough to make running the loop in parallel profitable (otherwise, Power Fortran runs the loop serially). If the loop bound is a small constant (such as 10) instead of n, Power Fortran would not generate a DOACROSS statement for the loop and the listing file states that the loop does not contain enough work. Conversely, if the bound is a large constant (such as 101), Power Fortran generates the DOACROSS statement without the IF clause.
The –parallelio option (or –pio) enables the parallelization of loops that contain I/O statements. The no version, which is the default, disables this optimization. Use this option only on systems with parallel I/O capabilities or where I/O statements in loops are not executed.
This section discusses the various ways in which you can control the standard transformations that Power Fortran performs.
The –limit=n option (or –lm=n) controls the amount of time Power Fortran can spend trying to determine whether a loop is safe to run in parallel. Power Fortran estimates how much time is required to analyze each loop nest construct. If an outer loop looks like it would take too much time to analyze, Power Fortran ignores the outer loop and recursively visits the inner loops.
Larger limits often allow Power Fortran to generate parallel code for deeply nested loop structures that it might not otherwise be able to run safely in parallel. However, with larger limits Power Fortran can also take more time to analyze a program. (The limit does not correspond to the DO loop nest level. It is an estimate of the number of loop orderings that Power Fortran can generate from a loop nest.) This option has the same effect as the global C*$* LIMIT(n) directive.
|Note: You do not usually need to change these limits.|
You can also change the thresholds for internal table size. Refer to the MIPSpro Fortran 77 Programmer's Guide for details.
The –optimize=n option (or –o=n) sets the optimization level. The higher you set this level, the more code is optimized and the longer Power Fortran runs. Programs that are written for running in parallel often do not need advanced transformation. With these programs, a lower optimization level is enough. Valid values for n are as follows:
Avoids converting loops to run in parallel.
Converts loops to run in parallel without using advanced data dependence tests. Enables loop interchanging.
Determines when scalars need last-value assignment using lifetime analysis. Also uses more powerful data dependence tests to find loops that can run safely in parallel. This level allows reductions in loops that execute concurrently but only if the –roundoff option is set to 2. (Refer to the following section for details about the –roundoff option.)
Breaks data dependence cycles using special techniques and additional loop interchanging methods, such as interchanging triangular loops. This level also implements special-case data dependence tests.
Generates two versions of a loop, if necessary, to break a data-dependent arc. This level also implements more-exact data dependence tests and allows special index sets (called wraparound variables) to convert more code to run in parallel.
Fuses two adjacent loops if it is legal to do so (that is, there are no data dependencies) and if the loops have the same control values. In certain limited cases, this level recognizes arrays as local variables. This level is the default.
Refer to the MIPSpro Fortran 77 Programmer's Guide for examples.
This option has the same effect as the global C*$* OPTIMIZE(n) directive described in Chapter 7, “Fine-Tuning Power Fortran.”
Suppresses any round-off transformations. This is the default.
Allows reductions to be performed in parallel. The valid reduction operators are +, *, min, and max. This value is one of the most commonly-specified user options.
Recognizes REAL induction variables. Permits memory management transformations (refer to the MIPSpro Fortran 77 Programmer's Guide for details).
Refer to the MIPSpro Fortran 77 Programmer's Guide for examples.
When executing reductions in parallel, Power Fortran processes values in a different order from the original serial code. Round-off errors accumulate differently and produce a slightly different answer. Some algorithms are sensitive to this variation, and so, by default, Power Fortran does not run reductions in parallel. Usually, these tiny variations are irrelevant, and you can allow Power Fortran to process a reduction in parallel allowing more loops to be run in parallel.
Function and subroutine calls create an obstacle to parallelization. Power Fortran provides three ways of dealing with this obstacle:
Assert that the external routine is safe for concurrent execution (see “CVD$ CNCALL” in Chapter 7).
Inline the routine by replacing the call to the external routine with the actual code.
Perform interprocedural analysis (IPA) by analyzing the external routine ahead of time and using the results of that analysis when a reference to the routine is encountered.
Inlining and IPA tend to be slow, memory-intensive operations. Attempting to inline all routines everywhere they occur can take a lot of time and use a lot of system resources. Inlining should usually be restricted to a few time-critical places. For details about inlining and IPA, and the related directives and command-line options, refer to Chapter 6, “Inlining and Interprocedural Analysis.”