This chapter contains the following sections:
“Overview Inlining and IPA” defines inlining and interprocedural analysis (IPA) and summarizes the driver options that control these processes.
“Specifying Functions for Inlining or IPA” explains how to specify which function calls will be inlined or analyzed.
“Specifying Occurrences for Inlining and IPA” explains how to manage the depth and time cost of inlining and IPA.
“Conditions That Prevent Inlining and IPA” lists several conditions that prevent inlining and interprocedural analysis.
Inlining is the process of replacing a function reference with a copy of the code of the function. This eliminates the overhead of the function call, and can assist other optimizations by making more evident the relationships between the function arguments and returned value, and the surrounding code. However, it also expands the size of the generated object code.
Interprocedural analysis (IPA) is the process of inspecting called functions to get information on relationships between arguments, returned values, and global data. IPA can provide many of the benefits of inlining without replacing the function reference.
You can control inlining and IPA from the command line and also by using directives in your source code. The driver options for inlining and IPA are summarized in Table 5-1. As with all special optimizations, you specify them as sub-options of the –WK option.
Long Option Name
Short Option Name
To request inlining of all eligible function calls, specify -inline. To request analysis of all eligible function calls, specify -ipa. However, full inlining or full IPA can be time-consuming and may not yield good results.
Often there are specific functions that are particularly good candidates for inlining or IPA, either because of their contents or because of their frequency of use. You can use the -inline and -ipa options to specify these functions. When you do, calls to other functions are not analyzed.
The –inline=list option (or –inl=list) specifies a list of functions that should be expanded inline. The –ipa=list option specifies a list of routines that should be analyzed. The names in list are separated by colons.
The following command performs inline expansion on the two routines saxpy and daxpy from the file foo.f:
f90 -WK,-inline=saxpy:daxpy foo.f
The compiler looks for the routines in the current source file, unless you specify an –inline_from or –ipa_from option. Refer to “Specifying Where to Search for Routines” for details.
In order to copy or to analyze a function, the compiler must have access to the text of the function body. If you do not specify otherwise, the compiler searches for the function in the current source file.
The options listed in Table 5-2 tell the compiler where to search for the routines specified with the –inline or –ipa options. If you do not specify either option, the compiler searches the current source file by default.
Long Option Name
Short Option Name
– inff= list
– ipaff= list
In each case, list consists of names of files or directories separated by colons. When you specify a directory, the compiler uses all appropriate files in that directory. For example
f90 ... -WK,-inline_from_files=subs.f90:../common
|Note: These options by themselves do not initiate inlining or IPA. They only specify where to look for the routines. Use them in conjunction with the appropriate –inline or –ipa option.|
If you specify a nonexistent file or directory, the compiler issues an error. If you specify multiple –inline_from or –ipa_from options, the compiler concatenates their lists. All lists are searched in the order that they appear on the command line.
The compiler recognizes two special abbreviations when specified in list:
“-” means current source file (as listed on the command line or specified in an –input=file command line option)
“.” means the current working directory
The following command specifies inline expansion on the current source file, calc.f90, followed by subs.f90.
% f90 -WK,-inline,-inline_from_files=-:subs.f90 calc.f90
When executed, the compiler searches the current source file calc.f and input.f for all eligible routines to expand. (It searches for all eligible routines because the –inline option was specified without a list.)
The compiler resolves routine name references by a searching for them in the order that they appear in –inline_from/–ipa_from options on the command line. Libraries are searched in their original lexical order.
The compiler recognizes the type of file from its extension, or lack of one, as described in Table 5-3. (The creation and use of libraries is the subject of the next topic.)
Type of File
.f, .F, .for, .FOR
Fixed-format source (Fortran 77 or Fortran 90)
Fortran source run through cpp
Library created with –inline_create or –ipa_create option
Normally, inlining or IPA is done directly from a Fortran source file. However, when the same set of functions is called from many different programs, it is more efficient to create a pre-analyzed library of the routines.
Use the –inline_create=name option (or –incr=name) to create a library of prepared function texts for later use. The –ipa_create=name option (or –ipacr=name) is the analogous option for IPA. The created library contains preprocessed information about the functions in a source file. The compiler can use this quickly for inlining or for IPA.
|Note: Libraries created for inlining contain complete information and can be used for both inlining and IPA. Libraries created for IPA contain only summary information and can be used only for IPA.|
The compiler assigns name to the library file it creates. Since the compiler recognizes input libraries by their file suffix, you should specify the file suffix .klib, for example: prog.klib.
Creating a library is done separately from compiling. Create a library using the –inline_create option (or the –ipa_create option for IPA only). For example, the following command line creates a library called prog.klib based on the functions in source program prog.f90:
f90 ... -WK,-inline_create=prog.klib prog.f90
When you specify this option the compiler creates only the library; it does not compile the source program. The following command compiles samp.f90, taking information about all eligible functions from prog.klib.
f90 ... -WK,-inl,-inlf=prog.klib samp.f90
When creating a library, you can specify only one –inline_create (–ipa_create) option. Therefore, you can create only one library at a time. The compiler overwrites any existing file with the same name as the library.
If you do not specify the –inline (–ipa) option along with the –inline_create (–ipa_create) option, the compiler includes all routines from the inlining universe in the library, if possible. If you specify –inline=list or –ipa=list, the compiler includes only the named routines in the library.
|Tip: You do not have to generate your inlining or IPA library from the same source that will actually be linked into the running program. This capability can cause errors if misused, but it can also be useful. For example, you can write a library of hand-optimized assembly language routines, then construct an IPA library using Fortran routines that mimic the behavior of the assembly code. Thus, you can do parallelism analysis with IPA correctly, yet actually call the hand-optimized assembly routines.|
The –inline_and_copy (or –inlc) option functions like the –inline option, except that the compiler copies the unoptimized text of a routine into the transformed code file each time the routine is called or referenced. Use this option when inlining routines that are called from the file in which they are located. This option has no special effect when the routines being inlined are being taken from a library or separate source file.
When a routine has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized routine is still available in case another source file contains a reference to it.
|Note: The –inline_and_copy algorithm assumes that all CALLs and references to the routine precede the routine itself in the source file. If the routine is referenced after the text of the routine and the compiler cannot inline that particular call site, it invokes the unoptimized version of the routine.|
The loop level, depth, and manual options allow you to specify specific instances of the routines specified with the –inline or –ipa options to process.
The –inline_looplevel=integer (or –inll=integer) and –ipa_looplevel=integer (or –ipall=integer) options enable you to limit inlining and interprocedural analysis to routines that are referenced in deeply nested loops, where the reduced call overhead or enhanced optimization is multiplied.
Because inlining increases the size of the code, the extra paging and cache contention can actually slow down a program. Restricting inlining to routines used in DO loops multiplies the benefits of eliminating subroutine and function call overhead for a given amount of code space expansion. (If inlining appears to have slowed an application code, try using IPA, which has little effect on code space and the number of temporary variables.)
To determine which loops are most deeply nested, the compiler constructs a call graph to account for nesting of loops farther up the call chain. integer is defined relative to the most deeply nested leaf of the call graph. For example, if you specify 1 for integer, the compiler expands calls in only the most deeply nested loop. If you specify 2, the compiler expands routines in the deepest and second-deepest nested loops. Specifying a large number for integer enables inlining or IPA at any nesting level up to and including the integer value. If you do not specify –inline/ipa_looplevel, the loop level is 2.
Consider the code skeleton in Example 5-1.
PROGRAM MAIN .. CALL A ------> SUBROUTINE A .. DO DO CALL B -----> SUBROUTINE B ENDDO DO ENDDO DO CALL C -------> SUBROUTINE C ENDDO ENDDO
The CALL B is inside a doubly-nested loop and therefore, is more profitable for the compiler to expand than the CALL A. The CALL C is quadruply nested, so inlining C yields the greatest gain of the three.
For –inline_looplevel=1, only the functions called in the most deeply-nested call sites are inlined (the call to C in Example 5-1). –inline_looplevel=2 inlines only routines called at the most deeply nested level and one loop less deeply nested. –inline_looplevel=3 would be required to inline subroutine B, because its call is two loops less nested than the call to subroutine C. A value of 3 or greater causes the compiler to inline C into B, and then to inline the new B into the main program.
The calling tree written to the listing file includes the nesting depth level of each call in each program unit and the aggregate nesting depth (the sum of the nesting depths for each call site, starting from the main program). You can use this information to identify the best routines for inlining. (See “Setting the Listing Level”.)
A routine that passes the –inline_looplevel test is inlined everywhere it is used, even places that are not in deeply nested loops. If some, but not all, invocations of a routine are to be expanded, use the C*$* INLINE or C*$* IPA directives just before each CALL/reference to be expanded (refer to “Fine-Tuning Inlining and IPA”).
When a routine is expanded inline, it can contain references to other routines. The compiler must decide whether to recursively expand these references (which might themselves contain yet other references, and so on).
The –inline_depth=integer option (or –ind=integer) restricts the depth to which the compiler continues to attempt inlining already inlined routines. Valid values for integer are:
Specifies a depth to which inlining is limited. The default is 2.
Uses the default value (2).
Limits inline expansion to only those routines that do not reference other routines (that is, only leaf routines are inlined). The compiler does not support any other negative values.
Recursive inlining can be quite expensive in compilation time. Exercise discretion in its use.
|Note: There is no corresponding –ipa_depth option.|
The –inline_man (or –inm) option enables recognition of the C*$* INLINE directive. This directive, described in “Fine-Tuning Inlining and IPA”, allows you to select individual instances of routines to be inlined. The –ipa_man (or –ipam) option is the analogous option for the C*$* IPA directive.
This section lists conditions that prevent the compiler from inlining and analyzing subroutines and functions, whether from a library or source file. Many constructs that prevent inlining will also stop or restrict interprocedural analysis.
These are the conditions that inhibit inlining:
Dummy and actual parameters are mismatched in type or class.
Dummy parameters are missing.
Actual parameters are missing and the corresponding dummy parameters are arrays.
An actual parameter is a non-scalar expression (for example, A+B, where A and B are arrays).
The number of actual parameters differs from the number of dummy parameters.
The size of an array actual parameter differs from the array dummy parameter and the arrays cannot be made linear.
The calling routine and called routine have mismatched COMMON declarations.
The called routine has EQUIVALENCE statements (some of these can be handled).
The called routine contains NAMELIST statements.
The called routine has dynamic arrays.
The CALL to be expanded has alternate return parameters.
Inlining is also inhibited when the routine to be inlined
is too long (the limit is about 600 lines)
contains a SAVE statement
contains variables that are live-on-entry, even if they are not in explicit SAVE statements
contains a DATA statement (DATA implies SAVE) and the variable is live on entry
contains a CALL with a subroutine or function name as an argument
contains a C*$*INLINE directive
contains unsubscripted array references in I/O statements
contains POINTER statements