Chapter 8. MPI Troubleshooting

This chapter provides answers to frequently asked questions about MPI.

What does MPI: could not run executable mean?

It means that something happened while mpirun was trying to launch your application, which caused it to fail before all of the MPI processes were able to handshake with it.

Can this error message be more descriptive?

No, because of the highly decoupled interface between mpirun and arrayd, no other information is directly available. mpirun asks arrayd to launch a master process on each host and listens on a socket for those masters to connect back to it. Because the masters are children of arrayd, whenever one of the masters terminates, arrayd traps SIGCHLD and passes that signal back to mpirun. If mpirun receives a signal before it has established connections with every host in the job, that is an indication that something has gone wrong. In other words, there is one of two possible bits of information available to mpirun in the early stages of initialization: success or failure.

Is there something more that can be done?

One proposed idea is to create an mpicheck utility (similar to ascheck), which could run some simple experiments and look for things that are obviously broken from the mpirun point of view.

In the meantime, how can we figure out why mpirun is failing?

You can use the following checklist:

  • Look at the last few lines in /var/adm/SYSLOG for any suspicious errors or warnings. For example, if your application tries to pull in a library that it cannot find, a message should appear here.

  • Check for misspelling of your application name.

  • Be sure that you are setting your remote directory properly. By default, mpirun attempts to place your processes on all machines into the directory that has the same name as $PWD. However, different functionality is required sometimes. For more information, see the mpirun(1) man page description of the -dir option.

  • If you are using a relative path name for your application, be sure that it appears in $PATH. In particular, mpirun will not look in the . file for your application unless . appears in $PATH.

  • Run /usr/etc/ascheck to verify that your array is configured correctly.

  • Be sure that you can use rsh (or arshell) to connect to all of the hosts that you are trying to use, without entering a password. This means that either the /etc/hosts.equiv or the ~/.rhosts file must be modified to include the names of every host in the MPI job. Note that using the -np syntax (that is, not specifying host names) is equivalent to typing localhost, so a localhost entry is also needed in either the /etc/hosts.equiv or the ~/.rhosts file.

  • If you are using an MPT module to load MPI, try loading it directly from within your .cshrc file instead of from the shell. If you are also loading a ProDev module, be sure to load it after the MPT module.

  • To verify that you are running the version of MPI that you think you are, use the -verbose option of the mpirun(1) command.

  • Be very careful when setting MPI environment variables from within your .cshrc or .login files, because these settings will override any settings that you might later set from within your shell (because MPI creates a fresh login session for every job). The safe way to set up environment variables is to test for the existence of $MPI_ENVIRONMENT in your scripts and set the other MPI environment variables only if it is undefined.

  • If you are running under a Kerberos environment, you might encounter difficulty because currently, mpirun is unable to pass tokens. For example, if you use telnet to connect to a host and then try to run mpirun on that host, the process fails. But if you use rsh instead to connect to the host, mpirun succeeds. (This might be because telnet is kerberized but rsh is not.) If you are running under a Kerberos environment, you should talk to the local administrators about the proper way to launch MPI jobs.

How do I combine MPI with other tools?

In general, the rule to follow is to run mpirun on your tool and then run the tool on your application. Do not try to run the tool on mpirun. Also, because of the way that mpirun sets up stdio, it might require some effort to see the output from your tool. The simplest case is that in which the tool directly supports an option to redirect its output to a file. In general, this is the recommended way to mix tools with mpirun. However, not all tools (for example, dplace) support such an option. But fortuntately, it is usually possible to "roll your own" by wrapping a shell script around the tool and having the script perform the following redirection:

> cat myscript
setenv MPI_DSM_OFF
dplace -verbose a.out 2> outfile
> mpirun -np 4 myscript
hello world from process 0
hello world from process 1
hello world from process 2
hello world from process 3
> cat outfile
there are now 1 threads
Setting up policies and initial thread.
Migration is off.
Data placement policy is PlacementDefault.
Creating data PM.
Data pagesize is 16k.
Setting data PM.
Creating stack PM.
Stack pagesize is 16k.
Stack placement policy is PlacementDefault.
Setting stack PM.
there are now 2 threads
there are now 3 threads
there are now 4 threads
there are now 5 threads 

Combining MPI with dplace

To combine MPI with the dplace tool, use the following code:

setenv MPI_DSM_OFF
mpirun -np 4 dplace -place file a.out

Combining MPI with perfex

To combine MPI with the perfex tool, use the following code:

 mpirun -np 4 perfex -mp -o file a.out

The -o option to perfex became available for the first time in IRIX 6.5. On earlier systems, you can use a shell script, as previously described. However, a shell script will allow you to view only the summary for the entire job. You can view individual statistics for each process only by using the the -o option.

Combining MPI with rld

To combine MPI with the rld tool, use the following code:

setenv _RLDN32_PATH /usr/lib32/rld.debug
setenv _RLD_ARGS "-log outfile -trace"
mpirun -np 4 a.out

You can create more than one outfile, depending on whether you are running out of your home directory and whether you use a relative path name for the file. The first will be created in the same directory from which you are running your application, and will contain information that applies to your job. The second will be created in your home directory and will contain (uninteresting) information about the login shell that mpirun created to run your job. If both directories are the same, the entries from both are merged into a single file.

Combining MPI with TotalView

To combine MPI with the TotalView tool, use the following code:

totalview mpirun -a -np 4 a.out

In this one special case, you must run the tool on mpirun and not the other way around. Because TotalView uses the -a option, this option must always appear as the first option on the mpirun command.

How can I allocate more than 700 to 1000 MB when I link with libmpi?

On IRIX versions earlier than 6.5, there are no so_locations entries for the MPI libraries. The way to fix this is to requickstart all versions of libmpi as follows:

cd /usr/lib32/mips3
rqs32 -force_requickstart -load_address 0x2000000 ./
cd /usr/lib32/mips4
rqs32 -force_requickstart -load_address 0x2000000 ./
cd /usr/lib64/mips3
rqs64 -force_requickstart -load_address 0x2000000 ./
cd /usr/lib64/mips4
rqs64 -force_requickstart -load_address 0x2000000 ./   

Note: This procedure requires root access.

Why does my code run correctly until it reaches MPI_Finalize(3) and then hang?

This problem is almost always caused by send or recv requests that are either unmatched or not completed. An unmatched request would be any blocking send request for which a corresponding recv request is never posted. An incomplete request would be any nonblocking send or recv request that was never freed by a call to MPI_Test(3), MPI_Wait(3), or MPI_Request_free(3). Common examples of unmatched or incomplete requests are applications that call MPI_Isend(3) and then use internal means to determine when it is safe to reuse the send buffer, and therefore, never bother to call MPI_Wait(3). Such codes can be fixed easily by inserting a call to MPI_Request_free(3) immediately after all such send requests.

Why do I keep getting error messages about MPI_REQUEST_MAX being too small, no matter how large I set it?

You are probably calling MPI_Isend(3) or MPI_Irecv(3) and not completing or freeing your request objects. You should use MPI_Request_free(3), as described in the previous question.

Why am I not seeing stdout or stderr output from my MPI application?

Beginning with our MPI 3.1 release, all stdout and stderr output is line-buffered, which means that mpirun will not print any partial lines of output. This sometimes causes problems for codes that prompt the user for input parameters but do not end their prompts with a newline character. The only solution for this is to append a newline character to each prompt.