Chapter 6. Inlining and Interprocedural Analysis

This chapter contains the following sections:

Overview of Inlining and IPA

Inlining is the process of replacing a function reference with the text of the function. This process eliminates the overhead of the function call and can assist other optimizations by making relationships between function arguments, returned values, and the surrounding code easier to find.

Interprocedural analysis (IPA) is the process of inspecting called functions for information on relationships between arguments, returned values, and global data. This process can provide many of the benefits of inlining without replacing the function reference.

You can perform inlining and IPA from the command line and using directives in your source code.

Using Command-Line Options

The compiler performs inlining and IPA when you specify the options listed in Table 6-1 along with the –pfa option using the following syntax:

% f77 [f77option ...] -pfa,option[,option]... file 

f77_option is any option you can specify directly to the compiler and option is any of the options listed in Table 6-1.

Table 6-1. Inlining and IPA Options

Long Option Name

Short Option Name

Default Value

–inline[=list]

–inl[=list]

option off

–ipa[=list]

–ipa[=list]

option off

–inline_and _copy

–inlc

option off

–inline_looplevel=integer

–inll=integer

2

–ipa_looplevel=integer

–ipall=integer

2

–inline_depth=integer

–ind=integer

2

–inline_man

–inm

option off

–ipa_man

–ipam

option off

–inline_from_files=list

–inff=list

option off

–ipa_from_files=list

–ipaff=list

option off

–inline_from_libraries=list

–infl=list

option off

–ipa_from_libraries=list

–ipa=list

option off

–inline_create[=name]

–incr=[=name]

option off

–ipa_create=[=name]

–ipacr=[=name]

option off


Specifying Routines for Inlining or IPA

The –inline[=list] option (or –inl[=list]) provides a list of routines to be expanded inline; the –ipa[=list] option provides a list of routines to be analyzed. The routine names in list must be separated by colons. If you do not specify a list of routines, the compiler expands all eligible routines. The compiler looks for the routines in the current source file, unless you specify an –inline_from or –ipa_from option. Refer to “Specifying Where to Search for Routines” for details.

Example

The following command performs inline expansion on the two routines saxpy and daxpy from the file foo.f:

% f77 -pfa,-inline=saxpy:daxpy foo.f 

Refer to “Conditions That Prevent Inlining and IPA” for information about conditions that prevent inlining and IPA.

The –inline_and_copy (or –inlc) option functions like the –inline option, except that the compiler copies the unoptimized text of a routine into the transformed code file each time the routine is called or referenced. Use this option when inlining routines that are called from the file in which they are located. This option has no special effect when the routines being inlined are taken from a library or separate source file.

When a routine has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized routine is still available in case another source file contains a reference to it.


Note: The –inline_and_copy algorithm assumes that all CALLs and references to the routine precede the routine itself in the source file. If the routine is referenced after the text of the routine and the compiler cannot inline that particular call site, it invokes the unoptimized version of the routine.


Specifying Occurrences for Inlining and IPA

The loop level, depth, and manual options allow you to specify certain instances of the routines to process with the –inline or –ipa options.

Loop Level

The –inline_looplevel=integer (or –inll=integer) and –ipa_looplevel=integer (or –ipall=integer) options enable you to limit inlining and interprocedural analysis to routines that are referenced in deeply nested loops, where the reduced call overhead or enhanced optimization is multiplied.

integer is defined from the most deeply nested leaf of the call graph. To determine which loops are most deeply nested, the compiler constructs a call graph to account for nesting of loops farther up the call chain. For example, if you specify 1 for integer, the compiler expands routines in only the most deeply nested loop. If you specify 2 for integer, the compiler expands routines in the deepest and second deepest nested loops, and so on. Specifying a large number for integer enables inlining/IPA at any nesting level up to and including the integer value. If you do not specify –inline/ipa_looplevel, the loop level is 2.

Example

Consider the following code:

PROGRAM MAIN
  ..
 CALL A    ------> SUBROUTINE A

  ..
 DO
  DO
   CALL B -----> SUBROUTINE B
  ENDDO             DO
 ENDDO                DO
                       CALL C -------> SUBROUTINE C
                      ENDDO
                    ENDDO

The CALL B is inside a doubly-nested loop, and therefore is more profitable for the compiler to expand than the CALL A. The CALL C is quadruply nested, so inlining C yields the greatest gain of the three.

For –inline_looplevel=1, only the routines referenced in the most deeply nested call sites are inlined (subroutine C in the above example). (If more than one routine is called at the same loop nest level, the compiler selects all of them when that level is inlined/analyzed.)

–inline_looplevel=2 inlines only routines called at the most deeply-nested level and one loop less deeply-nested. (–inline_looplevel=3 would be required to inline subroutine B, because its call is two loops less nested than the call to subroutine C. A value of 3 or greater causes the compiler to inline C into B, then the new B to be inlined into the main program.)

The calling tree written to the listing file includes the nesting depth level of each call in each program unit and the aggregate nesting depth (the sum of the nesting depths for each call site, starting from the main program). You can use this information to identify the best routines for inlining.

A routine that passes the –inline_looplevel test is inlined everywhere it is used, even places that are not in deeply-nested loops. If some, but not all, invocations of a routine are to be expanded, use the C*$* INLINE or C*$* IPA directives just before each CALL/reference to be expanded (refer to “Fine-Tuning Inlining and IPA” in Chapter 7).

Because inlining increases the size of the code, the extra paging and cache contention can actually slow down a program. Restricting inlining to routines used in DO loops multiplies the benefits of eliminating subroutine and function call overhead for a given amount of code space expansion. (If inlining appears to have slowed an application code, investigate using IPA, which has little effect on code space and the number of temporary variables.)

Depth

The –inline_depth=integer option (or –ind=integer) restricts the number of times the compiler continues to attempt inlining already inlined routines. Valid values for integer are as follows:

1–10 

Specifies a depth to which inlining is limited. The default is 2.

0 

Uses the default value.

–1 

Limits inline expansion to only those routines that do not reference other routines (that is, only leaf routines are inlined). The compiler does not support any other negative values.

When a routine is expanded inline, it can contain references to other routines. The compiler must decide whether to recursively expand these references (which might themselves contain yet other references, and so on). This option limits the number of times the compiler performs this recursive expansion. Note that the default setting is quite low; if you know inlining is useful for a particular program, increase this setting.


Note: There is no –ipa_depth option.

Recursive inlining can be quite expensive in compilation time. Exercise discretion in its use.

Manual Control

The –inline_man (or –inm) option enables recognition of the C*$* INLINE directive. This directive, described in “Fine-Tuning Inlining and IPA” in Chapter 7, allows you to select individual instances of routines to be inlined. The –ipa_man (or –ipam) option is the analogous option for the C*$* IPA directive.

Specifying Where to Search for Routines

The options listed in Table 6-2 tell the compiler where to search for the routines specified with the –inline or –ipa options. If you do not specify either option, the compiler searches the current source file by default.

Table 6-2. Inlining and IPA Search Command-Line Options

Long Option Name

Short Option Name

–inline_from_files=list

–inff=list

–ipa_from_files=list

–ipaff=list

–inline_from_libraries=list

–infl=list

–ipa_from_libraries=list

–ipafl=list

If one of the names in list is a directory, the compiler uses all appropriate files in that directory. You can specify multiple files and directories simultaneously using a colon-separated list.

For example

-pfa,-inline_from_files=file1:file2:file3

The compiler recognizes the type of file from its extension, or lack of one, as described in Table 6-3.

Table 6-3. Filename Extensions

Extension

Type of File

.f, .F, .for, .FOR

Fortran source

.i

Fortran source run through cpp

.klib

Library created with –inline_create or –ipa_create option

Other

Directory

The compiler recognizes two special abbreviations when specified in list:

  • “–” means current source file (as listed on the command line or specified in an –input=file command-line option)

  • “.” means the current working directory

Example

The following command specifies inline expansion on the source file, calc.f:

% f77 -pfa,-inline,-inline_from_files=-:input.f calc.f

When executed, the compiler searches the current source file calc.f and input.f for all eligible routines to expand. It also searches for all eligible routines because the –inline option was specified without a list setting.

If you specify a non-existent file or directory, the compiler issues an error.

If you specify multiple –inline_from or –ipa_from options, the compiler concatenates their lists to produce a bigger universe. The lists are searched in the order that they appear on the command line.

The compiler resolves routine name references by a searching for them in the order that they appear in –inline_from/–ipa_from options on the command line. Libraries are searched in their original lexical order.


Note: These options by themselves do not initiate inlining or IPA. They only specify where to look for the routines. Use them in conjunction with the appropriate –inline or –ipa option.


Creating Libraries

When performing inlining and IPA, the compiler analyzes the routines in the source program. Normally, inlining is done directly from a source file. However, when inlining the same set of routines in many different programs, it is more efficient to create a pre-analyzed library of the routines. Use the –inline_create[=name] option (or –incr[=name]) to create a library of prepared routines (for later use with the –inline_from_libraries option). The compiler assigns name to the library file it creates; for maximum compatibility, use the filename extension .klib. For example: samp.klib.

The –ipa_create[=name] option (or –ipacr[=name]) is the analogous option for IPA.

You do not have to generate your inlining/IPA library from the same source that will actually be linked into the running program. This capability can cause errors, but it can also be quite useful. For example, you can write a library of hand-optimized assembly language routines, then construct an IPA library using Fortran routines that mimic the behavior of the assembly code. Thus, you can do parallelism analysis with IPA correctly, but still actually call the hand-optimized assembly routines.

The procedure for creating and using a library for inlining or IPA is given below.

  1. Create a library using the –inline_create option (or the –ipa_create option for IPA). For example, the following command line creates a library called prog.klib for the source program prog.f:

    % f77 -pfa,-inline_create=prog.klib prog.f
    

    When you specify this option, the compiler creates only the library; it does not compile the source program or create a transformed version of the file.

  2. Compile the program with inlining enabled and specify the new library:

    % f77 -pfa,-inl,-inlf=prog.klib samp.f
    


Note: Libraries created for inlining contain complete information and can be used for both inlining and IPA. Libraries created for IPA contain only summary information and can be used only for IPA.

When creating a library, you can specify only one –inline_create (–ipa_create) option. Therefore, you can create only one library at a time. The compiler overwrites any existing file with the same name as the library.

If you do not specify the –inline (–ipa) option along with the –inline_create (–ipa_create) option, the compiler includes all routines from the inlining universe in the library, if possible. If you specify –inline=list or –ipa=list, the compiler includes only the named routines in the library.

Conditions That Prevent Inlining and IPA

This section lists conditions that prevent the compiler from inlining and analyzing subroutines and functions, whether from a library or source file. Many constructs that prevent inlining also stop or restrict interprocedural analysis.

These conditions inhibit inlining:

  • Dummy and actual parameters are mismatched in type or class.

  • Dummy parameters are missing.

  • Actual parameters are missing and the corresponding dummy parameters are arrays.

  • An actual parameter is a non-scalar expression (for example, A+B, where A and B are arrays).

  • The number of actual parameters differs from the number of dummy parameters.

  • The size of an array actual parameter differs from the array dummy parameter and the arrays cannot be made linear.

  • The calling routine and called routine have mismatched COMMON declarations.

  • The called routine has EQUIVALENCE statements (some of these can be handled).

  • The called routine contains NAMELIST statements.

  • The called routine has dynamic arrays.

  • The CALL to be expanded has alternate return parameters.

Inlining is also inhibited when the routine to be inlined

  • is too long (the limit is about 600 lines)

  • contains a SAVE statement

  • contains variables that are live-on-entry, even if they are not in explicit SAVE statements

  • contains a DATA statement (DATA implies SAVE) and the variable is live-on-entry

  • contains a CALL with a subroutine or function name as an argument

  • contains a C*$*INLINE directive

  • contains unsubscripted array references in I/O statements

  • contains POINTER statements