Chapter 5. Inlining and Interprocedural Analysis

This chapter contains the following sections:

Overview Inlining and IPA

Inlining is the process of replacing a function reference with a copy of the code of the function. This eliminates the overhead of the function call, and can assist other optimizations by making more evident the relationships between the function arguments and returned value, and the surrounding code. However, it also expands the size of the generated object code.

Interprocedural analysis (IPA) is the process of inspecting called functions to get information on relationships between arguments, returned values, and global data. IPA can provide many of the benefits of inlining without replacing the function reference.

You can control inlining and IPA from the command line and also by using directives in your source code. The driver options for inlining and IPA are summarized in Table 5-1. As with all special optimizations, you specify them as sub-options of the –WK option.

Table 5-1. Inlining and IPA Options

Long Option Name

Short Option Name

Default Value

–inline[=list]

–inl[=list]

option off

–ipa[=list]

–ipa[=list]

option off

–inline_and _copy

–inlc

option off

–inline_looplevel=integer

–inll=integer

2

–ipa_looplevel=integer

–ipall=integer

2

–inline_depth=integer

–ind=integer

2

–inline_man

–inm

option off

–ipa_man

–ipam

option off

–inline_from_files=list

–inff=list

option off

–ipa_from_files=list

–ipaff=list

option off

–inline_from_libraries=list

–infl=list

option off

–ipa_from_libraries=list

–ipa=list

option off

–inline_create=name

–incr=[=name]

option off

–ipa_create=name]

–ipacr=[=name]

option off


Specifying Functions for Inlining or IPA

To request inlining of all eligible function calls, specify -inline. To request analysis of all eligible function calls, specify -ipa. However, full inlining or full IPA can be time-consuming and may not yield good results.

Often there are specific functions that are particularly good candidates for inlining or IPA, either because of their contents or because of their frequency of use. You can use the -inline and -ipa options to specify these functions. When you do, calls to other functions are not analyzed.

The –inline=list option (or –inl=list) specifies a list of functions that should be expanded inline. The –ipa=list option specifies a list of routines that should be analyzed. The names in list are separated by colons.

The following command performs inline expansion on the two routines saxpy and daxpy from the file foo.f:

f90 -WK,-inline=saxpy:daxpy foo.f 

The compiler looks for the routines in the current source file, unless you specify an –inline_from or –ipa_from option. Refer to “Specifying Where to Search for Routines” for details.

Specifying Where to Search for Routines

In order to copy or to analyze a function, the compiler must have access to the text of the function body. If you do not specify otherwise, the compiler searches for the function in the current source file.

The options listed in Table 5-2 tell the compiler where to search for the routines specified with the –inline or –ipa options. If you do not specify either option, the compiler searches the current source file by default.

Table 5-2. Inlining and IPA Search Command Line Options

Long Option Name

Short Option Name

–inline_from_files = list

inff= list

–ipa_from_files=list

ipaff= list

–inline_from_libraries=list

–infl=list

–ipa_from_libraries=list

–ipafl=list

In each case, list consists of names of files or directories separated by colons. When you specify a directory, the compiler uses all appropriate files in that directory. For example

f90 ... -WK,-inline_from_files=subs.f90:../common


Note: These options by themselves do not initiate inlining or IPA. They only specify where to look for the routines. Use them in conjunction with the appropriate –inline or –ipa option.

If you specify a nonexistent file or directory, the compiler issues an error. If you specify multiple –inline_from or –ipa_from options, the compiler concatenates their lists. All lists are searched in the order that they appear on the command line.

The compiler recognizes two special abbreviations when specified in list:

  • “-” means current source file (as listed on the command line or specified in an –input=file command line option)

  • “.” means the current working directory

The following command specifies inline expansion on the current source file, calc.f90, followed by subs.f90.

% f90 -WK,-inline,-inline_from_files=-:subs.f90 calc.f90

When executed, the compiler searches the current source file calc.f and input.f for all eligible routines to expand. (It searches for all eligible routines because the –inline option was specified without a list.)

The compiler resolves routine name references by a searching for them in the order that they appear in –inline_from/–ipa_from options on the command line. Libraries are searched in their original lexical order.

The compiler recognizes the type of file from its extension, or lack of one, as described in Table 5-3. (The creation and use of libraries is the subject of the next topic.)

Table 5-3. Filename Extensions

Extension

Type of File

.f, .F, .for, .FOR

Fixed-format source (Fortran 77 or Fortran 90)

.f90, .F90

Free-format source

.i

Fortran source run through cpp

.klib

Library created with –inline_create or –ipa_create option

Other

Directory


Creating Libraries of Inline Functions

Normally, inlining or IPA is done directly from a Fortran source file. However, when the same set of functions is called from many different programs, it is more efficient to create a pre-analyzed library of the routines.

Use the –inline_create=name option (or –incr=name) to create a library of prepared function texts for later use. The –ipa_create=name option (or –ipacr=name) is the analogous option for IPA. The created library contains preprocessed information about the functions in a source file. The compiler can use this quickly for inlining or for IPA.


Note: Libraries created for inlining contain complete information and can be used for both inlining and IPA. Libraries created for IPA contain only summary information and can be used only for IPA.

The compiler assigns name to the library file it creates. Since the compiler recognizes input libraries by their file suffix, you should specify the file suffix .klib, for example: prog.klib.

Creating a library is done separately from compiling. Create a library using the –inline_create option (or the –ipa_create option for IPA only). For example, the following command line creates a library called prog.klib based on the functions in source program prog.f90:

f90 ... -WK,-inline_create=prog.klib prog.f90

When you specify this option the compiler creates only the library; it does not compile the source program. The following command compiles samp.f90, taking information about all eligible functions from prog.klib.

f90 ... -WK,-inl,-inlf=prog.klib samp.f90

When creating a library, you can specify only one –inline_create (–ipa_create) option. Therefore, you can create only one library at a time. The compiler overwrites any existing file with the same name as the library.

If you do not specify the –inline (–ipa) option along with the –inline_create (–ipa_create) option, the compiler includes all routines from the inlining universe in the library, if possible. If you specify –inline=list or –ipa=list, the compiler includes only the named routines in the library.


Tip: You do not have to generate your inlining or IPA library from the same source that will actually be linked into the running program. This capability can cause errors if misused, but it can also be useful. For example, you can write a library of hand-optimized assembly language routines, then construct an IPA library using Fortran routines that mimic the behavior of the assembly code. Thus, you can do parallelism analysis with IPA correctly, yet actually call the hand-optimized assembly routines.


Using the Inline-and-Copy Option

The –inline_and_copy (or –inlc) option functions like the –inline option, except that the compiler copies the unoptimized text of a routine into the transformed code file each time the routine is called or referenced. Use this option when inlining routines that are called from the file in which they are located. This option has no special effect when the routines being inlined are being taken from a library or separate source file.

When a routine has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized routine is still available in case another source file contains a reference to it.


Note: The –inline_and_copy algorithm assumes that all CALLs and references to the routine precede the routine itself in the source file. If the routine is referenced after the text of the routine and the compiler cannot inline that particular call site, it invokes the unoptimized version of the routine.


Specifying Occurrences for Inlining and IPA

The loop level, depth, and manual options allow you to specify specific instances of the routines specified with the –inline or –ipa options to process.

Using Loop Level

The –inline_looplevel=integer (or –inll=integer) and –ipa_looplevel=integer (or –ipall=integer) options enable you to limit inlining and interprocedural analysis to routines that are referenced in deeply nested loops, where the reduced call overhead or enhanced optimization is multiplied.

Because inlining increases the size of the code, the extra paging and cache contention can actually slow down a program. Restricting inlining to routines used in DO loops multiplies the benefits of eliminating subroutine and function call overhead for a given amount of code space expansion. (If inlining appears to have slowed an application code, try using IPA, which has little effect on code space and the number of temporary variables.)

To determine which loops are most deeply nested, the compiler constructs a call graph to account for nesting of loops farther up the call chain. integer is defined relative to the most deeply nested leaf of the call graph. For example, if you specify 1 for integer, the compiler expands calls in only the most deeply nested loop. If you specify 2, the compiler expands routines in the deepest and second-deepest nested loops. Specifying a large number for integer enables inlining or IPA at any nesting level up to and including the integer value. If you do not specify –inline/ipa_looplevel, the loop level is 2.

Consider the code skeleton in Example 5-1.

Example 5-1. Skeleton of Nested Loops


PROGRAM MAIN
  ..
 CALL A    ------> SUBROUTINE A

  ..
 DO
  DO
   CALL B -----> SUBROUTINE B
  ENDDO             DO
 ENDDO                DO
                       CALL C -------> SUBROUTINE C
                      ENDDO
                    ENDDO

The CALL B is inside a doubly-nested loop and therefore, is more profitable for the compiler to expand than the CALL A. The CALL C is quadruply nested, so inlining C yields the greatest gain of the three.

For –inline_looplevel=1, only the functions called in the most deeply-nested call sites are inlined (the call to C in Example 5-1). –inline_looplevel=2 inlines only routines called at the most deeply nested level and one loop less deeply nested. –inline_looplevel=3 would be required to inline subroutine B, because its call is two loops less nested than the call to subroutine C. A value of 3 or greater causes the compiler to inline C into B, and then to inline the new B into the main program.

The calling tree written to the listing file includes the nesting depth level of each call in each program unit and the aggregate nesting depth (the sum of the nesting depths for each call site, starting from the main program). You can use this information to identify the best routines for inlining. (See “Setting the Listing Level”.)

A routine that passes the –inline_looplevel test is inlined everywhere it is used, even places that are not in deeply nested loops. If some, but not all, invocations of a routine are to be expanded, use the C*$* INLINE or C*$* IPA directives just before each CALL/reference to be expanded (refer to “Fine-Tuning Inlining and IPA”).

Depth of Nested Inlining

When a routine is expanded inline, it can contain references to other routines. The compiler must decide whether to recursively expand these references (which might themselves contain yet other references, and so on).

The –inline_depth=integer option (or –ind=integer) restricts the depth to which the compiler continues to attempt inlining already inlined routines. Valid values for integer are:

1-10

Specifies a depth to which inlining is limited. The default is 2.

0

Uses the default value (2).

-1

Limits inline expansion to only those routines that do not reference other routines (that is, only leaf routines are inlined). The compiler does not support any other negative values.

Recursive inlining can be quite expensive in compilation time. Exercise discretion in its use.


Note: There is no corresponding –ipa_depth option.


Enabling Manual Control

The –inline_man (or –inm) option enables recognition of the C*$* INLINE directive. This directive, described in “Fine-Tuning Inlining and IPA”, allows you to select individual instances of routines to be inlined. The –ipa_man (or –ipam) option is the analogous option for the C*$* IPA directive.

Conditions That Prevent Inlining and IPA

This section lists conditions that prevent the compiler from inlining and analyzing subroutines and functions, whether from a library or source file. Many constructs that prevent inlining will also stop or restrict interprocedural analysis.

These are the conditions that inhibit inlining:

  • Dummy and actual parameters are mismatched in type or class.

  • Dummy parameters are missing.

  • Actual parameters are missing and the corresponding dummy parameters are arrays.

  • An actual parameter is a non-scalar expression (for example, A+B, where A and B are arrays).

  • The number of actual parameters differs from the number of dummy parameters.

  • The size of an array actual parameter differs from the array dummy parameter and the arrays cannot be made linear.

  • The calling routine and called routine have mismatched COMMON declarations.

  • The called routine has EQUIVALENCE statements (some of these can be handled).

  • The called routine contains NAMELIST statements.

  • The called routine has dynamic arrays.

  • The CALL to be expanded has alternate return parameters.

Inlining is also inhibited when the routine to be inlined

  • is too long (the limit is about 600 lines)

  • contains a SAVE statement

  • contains variables that are live-on-entry, even if they are not in explicit SAVE statements

  • contains a DATA statement (DATA implies SAVE) and the variable is live on entry

  • contains a CALL with a subroutine or function name as an argument

  • contains a C*$*INLINE directive

  • contains unsubscripted array references in I/O statements

  • contains POINTER statements