Chapter 7. Troubleshooting and Frequently Asked Questions

This chapter provides answers to some common problems users encounter when starting to use SGI's MPI, as well as answers to other frequently asked questions.

What are some things I can try to figure out why mpirun is failing?

Here are some things to investigate:

  • On IRIX systems, look at the last few lines in /var/adm/SYSLOG for any suspicious errors or warnings. On Linux systems, look in /var/log/messages. For example, if your application tries to pull in a library that it cannot find, a message should appear here.

  • Be sure that you did not misspell the name of your application.

  • To find rld/dynamic link errors, try to run your program without mpirun. You will get the “ mpirun must be used to launch all MPI applications" message, along with any rld link errors that might not be displayed when the program is started with mpirun.

  • Be sure that you are setting your remote directory properly. By default, mpirun attempts to place your processes on all machines into the directory that has the same name as $PWD. This should be the common case, but sometimes different functionality is required. For more information, see the section on $MPI_DIR and/or the -dir option in the mpirun man page.

  • If you are using a relative pathname for your application, be sure that it appears in $PATH. In particular, mpirun will not look in '.' for your application unless '.' appears in $PATH.

  • Run /usr/etc/ascheck to verify that your array is configured correctly.

  • Be sure that you can execute rsh (or arshell) to all of the hosts that you are trying to use without entering a password. This means that either /etc/hosts.equiv or ~/.rhosts must be modified to include the names of every host in the MPI job. Note that using the -np syntax (i.e. no hostnames) is equivalent to typing localhost, so a localhost entry will also be needed in one of the above two files.

  • On IRIX systems, if you are using an mpt module to load MPI, try loading it directly from within your .cshrc file instead of from the shell. If you are also loading a MIPSpro module, be sure to load it after the mpt module.

  • Use the -verbose option to verify that you are running the version of MPI that you think you are running.

  • Be very careful when setting MPI environment variables from within your .cshrc or .login files, because these will override any settings that you might later set from within your shell (due to the fact that MPI creates the equivalent of a fresh login session for every job). The safe way to set things up is to test for the existence of $MPI_ENVIRONMENT in your scripts and set the other MPI environment variables only if it is undefined.

  • If you are running under a Kerberos environment, you may experience unpredictable results because currently, mpirun is unable to pass tokens. For example, in some cases, if you use telnet to connect to a host and then try to run mpirun on that host, it fails. But if you instead use rsh to connect to the host, mpirun succeeds. (This might be because telnet is kerberized but rsh is not.) At any rate, if you are running under such conditions, you will definitely want to talk to the local administrators about the proper way to launch MPI jobs.

My code runs correctly until it reaches MPI_Finalize() and then it hangs.

This is almost always caused by send or recv requests that are either unmatched or not completed. An unmatched request is any blocking send for which a corresponding recv is never posted. An incomplete request is any nonblocking send or recv request that was never freed by a call to MPI_Test() , MPI_Wait(), or MPI_Request_free() .

Common examples are applications that call MPI_Isend() and then use internal means to determine when it is safe to reuse the send buffer. These applications never call MPI_Wait(). You can fix such codes easily by inserting a call to MPI_Request_free() immediately after all such isend operations, or by adding a call to MPI_Wait() at a later place in the code, prior to the point at which the send buffer must be reused.

I keep getting error messages about MPI_REQUEST_MAX being too small, no matter how large I set it.

There are two types of cases in which the MPI library reports an error concerning MPI_REQUEST_MAX. The error reported by the MPI library distinguishes these.

MPI has run out of unexpected request entries;
the current allocation level is: XXXXXX

The program is sending so many unexpected large messages (greater than 64 bytes) to a process that internal limits in the MPI library have been exceeded. The options here are to increase the number of allowable requests via the MPI_REQUEST_MAX shell variable, or to modify the application.

MPI has run out of request entries; 
the current allocation level is: MPI_REQUEST_MAX = XXXXX

You might have an application problem. You almost certainly are calling MPI_Isend() or MPI_Irecv() and not completing or freeing your request objects. You need to use MPI_Request_free() , as described in the previous section.

I am not seeing stdout and/or stderr output from my MPI application.

Beginning with our MPT 1.2/MPI 3.1 release, all stdout and stderr is line-buffered, which means that mpirun does not print any partial lines of output. This sometimes causes problems for codes that prompt the user for input parameters but do not end their prompts with a newline character. The only solution for this is to append a newline character to each prompt.

Beginning with MPT 1.5.2, you can set the MPI_UNBUFFERED_STDIO environment variable to disable line-buffering. For more information, see the MPI(1) and mpirun(1) man pages.

How can I get the MPT software to install on my machine?

Message-Passing Toolkit software releases can be obtained at the SGI Software Download page at 

Where can I find more information about SHMEM?

See the intro_shmem(3) man page.

The ps(1) command says my memory use (SIZE) is higher than expected.

At MPI job start-up, when running on IRIX hosts, MPI calls SHMEM to cross-map all user static memory on all MPI processes to provide optimization opportunities. The result is large virtual memory usage. The ps(1) command's SIZE statistic is telling you the amount of virtual address space being used, not the amount of memory being consumed. Even if all of the pages that you could reference were faulted in, most of the virtual address regions point to multiply-mapped (shared) data regions, and even in that case, actual per-process memory usage would be far lower than that indicated by SIZE.

What does MPI: could not run executable mean?

This message means that something happened while mpirun was trying to launch your application, which caused it to fail before all of the MPI processes were able to handshake with it.

With Array Services 3.2 or later and MPT 1.3 or later, many scenarios that generated this error message are now improved to be more descriptive.

Prior to Array Services 3.2, no diagnostic information was directly available. This was due to the highly decoupled interface between mpirun and arrayd.

mpirun directs arrayd to launch a master process on each host and listens on a socket for those masters to connect back to it. Since the masters are children of arrayd, arrayd traps SIGCHLD and passes that signal back to mpirun whenever one of the masters terminates. If mpirun receives a signal before it has established connections with every host in the job, it knows that something has gone wrong.

How do I combine MPI with insert favorite tool here?

In general, the rule to follow is to run mpirun on your tool and then the tool on your application. Do not try to run the tool on mpirun. Also, because of the way that mpirun sets up stdio, seeing the output from your tool might require a bit of effort. The most ideal case is when the tool directly supports an option to redirect its output to a file. In general, this is the recommended way to mix tools with mpirun. Of course, not all tools (for example, dplace) support such an option. However, it is usually possible to make it work by wrapping a shell script around the tool and having the script do the redirection, as in the following example:

> cat myscript 
     setenv MPI_DSM_OFF 
     dplace -verbose a.out 2> outfile 
     > mpirun -np 4 myscript 
     hello world from process 0 
     hello world from process 1 
     hello world from process 2 
     hello world from process 3 
     > cat outfile 
     there are now 1 threads 
     Setting up policies and initial thread. 
     Migration is off. 
     Data placement policy is PlacementDefault. 
     Creating data PM. 
     Data pagesize is 16k. 
     Setting data PM. 
     Creating stack PM. 
     Stack pagesize is 16k. 
     Stack placement policy is PlacementDefault. 
     Setting stack PM. 
     there are now 2 threads 
     there are now 3 threads 
     there are now 4 threads 
     there are now 5 threads 

Must I use MPIO_Wait() and MPIO_Test()?

Beginning with MPT 1.8, MPT has unified the I/O requests generated from nonblocking I/O routines (such as MPI_File_iwrite()) and MPI requests from nonblocking message-passing routines (for example, MPI_Isend ()). Formerly, these were different types of request objects and needed to be kept separate (one was called MPIO_Request and the other, MPI_Request). Under MPT 1.8 and later, however, this distinction is no longer necessary. You can freely mix request objects returned from I/O and MPI routines in calls to MPI_Wait(), MPI_Test(), and their variants.

Must I modify my code to replace calls to MPIO_Wait() with MPI_Wait() and recompile?

No. If you have an application that you compiled prior to MPT 1.8, you can continue to execute that application under MPT 1.8 and beyond without recompiling. Internally, MPT uses the unified requests, and for example, translates calls to MPIO_Wait() into calls to MPI_Wait().

Why do I see “stack traceback” information when my MPI job aborts?

This is a new feature beginning with MPT 1.8. More information can be found in the MPI(1) man page in descriptions of the MPI_COREDUMP and MPI_COREDUMP_DEBUGGER environment variables.