This chapter describes the Message Passing Toolkit (MPT) implementation of the Parallel Virtual Machine (PVM) software. The following concepts are discussed:
Multiple computer systems as a virtual machine
Applications and environments
PVM program development
Data types
Environment variables
PVM is a software system that enables a collection of heterogeneous computer systems to be used as a coherent and flexible concurrent computation resource. The individual systems can be shared-memory or local-memory multiprocessors, vector supercomputers, specialized graphics engines, or scalar workstations interconnected by a variety of networks. From the user's point of view, the combination of these different systems can be treated as a single virtual machine when using PVM. The term host refers to one of the member computer systems.
PVM support software executes on each system in a user-configurable pool and presents a unified, general, and powerful computational environment for concurrent applications. User programs, written in C or Fortran programming languages, gain access to PVM in the form of library routines for functions such as the following:
Process or task initiation
Message transmission and reception
Synchronization through the use of barriers or rendezvous
Optionally, users can control the execution location of specific application components; the PVM system transparently handles message routing, data conversion for incompatible architectures, and other tasks that are necessary for operation in a heterogeneous, networked environment.
PVM is ideally suited for concurrent applications composed of many interrelated subalgorithms, although performance is good even for traditional parallel applications. PVM is particularly effective for heterogeneous applications that exploit specific strengths of individual systems on a network. As a loosely coupled, concurrent supercomputing environment, PVM is a viable scientific computing platform.
PVM has been used for molecular dynamics simulations, superconductivity studies, distributed fractal computations, matrix algorithms, and as the basis for teaching concurrent programming.
To develop a program that uses PVM, you must perform the following steps:
Add PVM function calls to your application for process initiation, communications, and synchronization. For syntax descriptions of these functions, see Chapter 3, “Functions and Subroutines”.
Build executable files for the systems that you will use, as described in “Building PVM Executable Files”.
Create a host file to define the virtual machine, as described in “Creating Host Files”.
If your program is in distributed mode, execute the PVM daemon and your application in one of the following ways:
As described in “Starting and Stopping the PVM Daemon”, for the PVM daemon, and as described in “Running PVM Applications”, for your application
As an NQS job, as described in “Using NQS to Run PVM Applications”
Through the PVM console by using the console spawn command, as described in Table 2-2
Troubleshoot the application, if necessary. For information on PVM troubleshooting, see “Troubleshooting PVM”.
After you have added PVM function calls, code can be linked, beginning with the source file or the object file.
If you begin with the source file, you must specify the -I (include) option and the Application Binary Interface (ABI) of the application development library (N32 or 64 ABIs), as follows:
cc -I /usr/array/PVM/include -64 -o compute compute.c -lpvm3 |
If you begin with an object file, the code can be linked as follows:
cc -64 -o compute compute.o -lpvm3 |
If you have the optional IRIX mpt module loaded, use the following command:
cc -64 -o compute compute.c -lpvm3 |
After the code is linked, you can install the executable files on the SGI systems you will be using. If you specified the ep option in the host file for a system, install the file in the specified directory. Otherwise, install it in the following directory:
$HOME/pvm3/bin/$PVM_ARCH |
Each system in the PVM virtual machine must have a separate entry in the host file. Lines that begin with a hash symbol (#), possibly preceded by white space, are ignored.
If you do not want PVM to start a host immediately, but you might start it later by using the pvm_addhosts(3) function or the PVM console add command, you do not need to include the host in the host file. However, if you need to set any of the options described in Table 2-1, you should include the specified system in the host file, preceded by the ampersand (&) character.
This command starts the PVM daemon in the background and tells it that automatic host file selection should be used. Hosts can be excluded based on many different resources. For more information on NQE policies, see NQE Administration. If a host file is also specified, PVM uses the options specified in the host file. A host specified in the host file will be included in the virtual machine only if that host is available, as determined by the NQE policy.
Example 2-1 is an example of a host file that contains the names of systems, which is the basic information necessary in a host file.
You should verify that no system is listed more than once, and that the system on which the master pvmd3(1) daemon will run (the master host) is included in the host file (see “Starting and Stopping the PVM Daemon”, for information on starting the pvmd3 daemon). Automatic host file selection always includes the host running the master pvmd3(1) daemon.
The $PVM_ROOT and $PVM_ARCH environment variables are set for you automatically when you load the mpt module to access the Message Passing Toolkit software. To customize your environment, you can specify the options listed in Table 2-1, after any system name in the host file.
A dollar sign ($) in an option introduces an environment variable name, for example, $PVM_ARCH. Each PVM daemon expands names from environment variables.
The simple host file in Example 2-1, works well if both of the following conditions are met:
You have a login with the same name on all of the systems in your host file.
The local system is listed in the .rhosts file on each of the remote systems.
To supply an alternative login name for the thud system, add the lo option to its host file entry, as follows:
thud lo=NAME |
To be queried for your password on a system named cyclone, add so=pw to its host file entry, as follows:
cyclone so=pw |
To specify the path of the daemon executable file for a system named sun114, add the dx option, as follows:
sun114 dx=/usr/fred/pvm3/lib/Sun/pvmd3 |
![]() | Note: By default, the MPT version of pvmd3 is installed in $PVM_ROOT/lib/$PVM_ARCH/pvmd3, where $PVM_ROOT and $PVM_ARCH are set for you automatically when you load the mpt module. |
The string specified in the previous example is passed to a shell so that variable expansion works. Following is another example that uses variable expansion:
sun114 dx=bin/$MYBIN/pvmd3 |
You can change the default value of any option for all hosts in a host file by specifying them on a line with an asterisk (*) in the host field, as in the following example:
thud.cs.utk.edu gust.sgi.com sun114 dx=/tmp/pvmd3 * lo=afriend so=pw |
The preceding example sets the default login name (on remote systems) to afriend and queries for a password on each system. Defaults set in this way are effective forward from the location at which they occur in the host file. They can be changed with another * line.
You can override the location of executable files by adding the ep option to your host file entries, as in the following example:
ep=$HOME/pvm3/bin |
Unlike the dx option, which names the daemon file, the ep option names a directory.
Example 2-2 shows a more complex host file in which host names are followed by options.
Example 2-2. Sample Host File with Host Name Options
# host file for testing on various platforms # default to my executable * dx=pvm/SUN4/pvmd3 fonebone refuge sigi.cs dx=pvm/PMAX/pvmd3 # reset default for other systems * dx=$PVM_ROOT/lib/$PVM_ARCH/pvmd3 # do not start this system, but define ep in case we add it later & rain.sgi.com ep=$(HOME)/bin ip=rain-hippi # borrowed accts, "guest", don't trust fonebone * lo=guest so=pw sn666.jrandom.com ep=$(HOME)/bin cubie.misc.edu ep=pvm/IPSC/pvmd3 |
Before you run a PVM executable file on an IRIX system, you must specify the architecture type by setting the PVM_ARCH environment variable. Four architecture types are supported for IRIX systems. With the software installed in the default locations, you must also set the PVM_ROOT environment variable to /usr/array/PVM and the PATH environment variable to $PVM_ROOT/lib/$PVM_ARCH. The following C shell example shows the setting of all three variables:
setenv PVM_ARCH SGIMP64 setenv PVM_ROOT /usr/array/PVM setenv PATH ${PATH}:${PVM_ROOT}/lib/$PVM_ARCH |
The architecture types shown in the following list are arranged in an approximate order of lowest to highest performance types:
After you have written a host file, you can start up the master pvmd3(1) daemon by passing it the host file as an argument. You must specify the appropriate path for pvmd3(1). For example, you can enter one of the following:
pvmd3 hostfile &orpvm [hostfile] |
If you do not specify a host file when starting the PVM console, the PVM daemon found in the default location will be started on the local machine.
The ampersand (&) in the first line tells the operating system to run pvmd3(1) in the background, which is what you will normally want to do.
You should not run pvmd3(1) in the background if you have to enter passwords for any of the slave systems (that is, if you included the so=pw option for one or more systems). In this case, run pvmd3(1) in the foreground and then stop it (by pressing CONTROL-Z) and put it in the background (by entering bg at the prompt) after all systems have started up.
To shut off PVM, enter halt at a PVM console prompt. For detailed information on using console prompts, see “Using Console Commands”.
If the master pvmd3(1) daemon has trouble starting a slave pvmd3(1) daemon on a system, the error message written to the PVM log file from the master pvmd3(1) may indicate the problem.
When the pvmd3(1) daemon is running successfully, you can start your application. PVM provides the following methods of starting applications:
Start the application from the shell command line.
With this method, you start the application as any command or application would be started. For example, if the application is named a.out, enter the following command at the shell command line prompt:
./a.out |
Start the application from the PVM console by using the spawn command.
With this method, you first start the console. After the pvm> prompt has appeared, enter the spawn command followed by the application name or path, as needed. For example, to run an application named cannon, enter the following command at the console command line prompt:
spawn cannon |
You can obtain help for the spawn command by typing help spawn at the console command line prompt.
Once the application has started, it displays standard output and standard error information for the initial task, but not for the other tasks in the application. PVM captures this output information and sends it to the master daemon. The daemon, in turn, prefaces each line with a PVM task identifier that identifies its source, and writes it to the PVM log file.
The log file can contain very useful information about the virtual machine and its tasks. By default, the log file contains output from the PVM daemon, including error messages and output from tasks. Optionally, the log file can contain debugging output from the daemon.
When PVM is run without NQS, the log file is located in /tmp. The IRIX implementation allows overlapping PVM virtual machines. Therefore, more than one PVM daemon started by the same user can run on the same host. The log file is located in /tmp/pvml.uid.vmid, where uid is the user ID and vmid is the virtual machine ID. By default, vmid is 0, but if the PVM_VMID (formerly PVMJID) environment variable is set, vmid will equal the numeric value of PVM_VMID.
Instead of having the data written to the PVM log file, you can request that output be sent as a PVM message to another task's output device. For more information, see the PvmOutputTid and PvmOutputCode options on the pvm_setopt(3) man page.
You can also redirect output by using options on the console spawn command (see Table 2-2) or by using the pvm_catchout(3) function.
PVM applications can be run as part of an NQS job script. Each NQS job has its own PVM daemon; therefore, the PVM daemon must be started within the NQS job script. This is different from interactive use, in which one daemon is run per user per system. Any application run as part of the same NQS job script uses the same PVM daemon. Using the PVM_VMID environment variable allows more than one daemon to run per user per system. A single user running multiple NQE jobs on a single host should set the PVM_VMID environment variable for each batch job.
PVM processes spawned by the daemon inherit the limits of the NQS job. This allows a user to run multiple NQS jobs that use PVM, each with limits of the NQS job being run. Previous versions of PVM used the same daemon for multiple NQS jobs.
The following example is an NQS job script to run the application foo:
module load mpt pvmd3 hostfile & # Start the daemon sleep 60 # Wait for startup foo # Run application pvm << EOF # Start console to halt pvm halt EOF |
Using the PVM console is an alternative to using the pvmd3(1) command to start the daemon and execute your application. The pvm(1) command starts the console, which can be started and stopped multiple times on any of the systems on which PVM is running.
Start the PVM console by using the following command line:
pvm [hostfile] |
When the console is started, it checks to see if a PVM daemon is running. If so, it simply attaches itself to the daemon and can be used to monitor ongoing PVM processes as shown:
% pvm pvmd already running pvm> |
If the daemon is not started, the pvm(1) command tries to start one, but the command must first find the daemon. (Currently, the pvm(1) command does not examine the hostfile argument, if provided, but simply passes its name to the daemon. Therefore, the pvm command cannot use information from this file.)
The logic used by the pvm command to start the daemon is as follows:
The command tries to execute $HOME/pvm3/lib/pvmd on all systems. $HOME/pvm3/lib/pvmd must be an executable file that is one of the following:
A shell script that starts up the PVM daemon, perhaps by using a host file. If you use this option, you may find it useful to have the script do other preparatory or related work.
A symbolic link to the PVM daemon. The following example shows how you can set up a link:
% mkdir ~/pvm3 % mkdir ~/pvm3/lib % ln -s $PVM_ROOT/lib/$PVM_ARCH/pvmd3 ~/pvm3/lib/pvmd |
If pvmd3(1) is not found or cannot be executed, the pvm(1) command explicitly tries to start $PVM_ROOT/lib/$PVM_ARCH/pvmd3.
If a daemon is started, you see the following:
% pvm pvm> |
If a daemon is not started, you see the following:
% pvm libpvm [pid-1]: Console: Can't start pvmd % |
When you enter the pvm(1) command, the console responds with a prompt and accepts the commands described in Table 2-2.
This section describes common problems encountered when using PVM and provides suggested solutions. There are several kinds of problems that can keep pvmd3(1) from building a virtual machine. The most common are permission problems.
If you do not specify the pw option for a particular system, your .rhosts file on that system must contain the name of the host from which you start the master pvmd. Otherwise, you will get a message like one of the following (although you may not get the entire message):
pvmd3@hostname: Permission denied |
pvmd3@hostname: Login incorrect |
To get the entire error message, enter the following command at a shell prompt:
rsh hostname daemon |
daemon is the location of the PVM daemon (for example, /tmp/pvm/pvmd3 or $PVM_ROOT/lib/$PVM_ARCH/pvmd3).
Look at the output of the command and consult whichever of the following sections most closely applies.
When you start the pvmd3(1) daemon, you may receive a message that PVM is already running because a file exists in /tmp. If no pvmd3(1) is running, it is likely that the last time you used PVM you did not terminate pvmd3(1) by using the console halt command, or the previous execution of the pvmd3 daemon terminated abnormally, leaving the files in /tmp. Remove the file named in the message and start pvmd3(1) again.
If you use a shell (such as .kshrc) that does not automatically execute a startup script that sets $PVM_ROOT on added hosts, you can set the PVM_DPATH environment variable to the full or relative path of the pvmd startup script, or include the dx option in the host file to specify the path to the startup script. The pvmd startup script automatically sets $PVM_ROOT on the remote host.
The following command shows how to set the PVM_DPATH environment variable:
setenv PVM_DPATH $PVM_ROOT/lib/pvmd |
The following command shows how to specify the pvmd startup script in the host file:
dx=/opt/ctl/mpt/mpt/pvm3/lib/pvmd |
![]() | Note: The dx option in the host file overrides the PVM_DPATH environment variable, and $PVM_ROOT is not acknowledged for dx, so the dx path must be a full pathname. |
If you get a message denying you permission, it probably means that your .rhosts file on the remote system does not include your local system name. Add a line like the following to your .rhosts file on the remote system:
local-host-name your-local-user-name |
Sometimes a system has more than one name, and the remote system may think your local system has a name that is different from the one that you have specified. To determine the name of your local system on the remote system, execute telnet(1) or rlogin(1) to get to the remote system and enter the following UNIX command:
% who am i |
Look at the last column of the output of this command, which contains the first 16 characters of what the remote system (the one to which you connected) thinks is the name of your local system (the one on which you entered telnet(1) or rlogin(1)). Make sure you put that system name (the full name, not just the first 16 characters) in your .rhosts file on the remote system. Your /etc/hosts file should contain the full name. If you do not have this file, see your system administrator for the name. Some older systems require that you spell the name exactly the same, including the case; newer systems accept the name in either uppercase or lowercase.
If you get a message saying your login is incorrect, there is probably no account on the remote system that has the same login name as your login name on the local system. In this case, you need to add a lo= username option to your PVM host file.
If you get a message about a version mismatch, it indicates that the versions of PVM on the two systems were built from different PVM releases. You may be building with an old library, accessing an old PVM version built from the public domain version, or having some similar problem. Ensure that the versions of PVM on the two systems are compatible.
As a general rule, releases of the public domain implementation of PVM with the same second digit in the version number (for example, 3.2.0 and 3.2.6) will interoperate. Changes that result in incompatibility are held until a major version change (for example, from version 3.2 to version 3.3). For compatibility, you might need to upgrade one of your versions of PVM.
A common application problem is the failure of a pvm_spawn() request. The PVM console command tickle 6 4 enables tracing of spawn requests. The complete executable path is printed in the PVM log file.
If you get any other messages, ensure that your .cshrc file on the remote system is not printing something out when you log in or is not trying to set your terminal characteristics (usually by using the stty(1) or tset(1) commands).
If you want to print from your .cshrc file when you log in, put the relevant commands in an if statement in your .cshrc file, as in the following example:
if ( { tty -s } && $?prompt ) then # example of printing something when you log in echo terminal type is $TERM # example of setting terminal attributes stty erase '^?' kill '^u' intr '^c' echo endif |
This statement ensures that printing occurs only when you log in from a terminal (and when you are not running a csh(1) command script).
This section describes how PVM data types are implemented on IRIX systems. This discussion assumes that you are familiar with the functions used to pack and unpack data; for more information, see “Data Transmittal” in Chapter 3, and “Data Receipt” in Chapter 3.
Table 2-3 and Table 2-4 present basic information about data types on IRIX systems.
Table 2-3. N32 ABI Library Data Types
Data characteristics | C functions | Fortran names |
---|---|---|
8 bits, not typed | pvm_pkbyte | BYTE1 |
16 bits, signed integer | pvm_pkshort | INTEGER2 |
32 bits, signed integer | pvm_pkint, pvm_pklong | INTEGER4 |
16 bits, unsigned integer | pvm_pkushort | Not applicable |
32 bits, unsigned integer | pvm_pkuint, pvm_pkulong | Not applicable |
32 bits, floating-point | pvm_pkfloat, | REAL4 |
64 bits, floating-point | pvm_pkdouble | REAL8 |
Two 32 bits, floating-point | pvm_pkcplx | COMPLEX8 |
Two 64 bits, floating-point | pvm_pkdcplx | COMPLEX16 |
Null-terminated character string | pvm_pkstr | Not applicable |
Fortran character constant or variable | Not applicable | STRING |
Table 2-4. 64 ABI Library Data Types
Data characteristics | C functions | Fortran names |
---|---|---|
8 bits, not typed | pvm_pkbyte | BYTE1 |
16 bits, signed integer | pvm_pkshort | INTEGER2 |
32 bits, signed integer | pvm_pkint | INTEGER4 |
64 bits, signed integer | pvm_pklong | Not applicable |
16 bits, unsigned integer | pvm_pkushort | Not applicable |
32 bits, unsigned integer | pvm_pkuint | Not applicable |
64 bits, unsigned integer | pvm_pkulong | Not applicable |
32 bits, floating-point | pvm_pkfloat, | REAL4 |
64 bits, floating-point | pvm_pkdouble | REAL8 |
Two 32 bits, floating-point | pvm_pkcplx | COMPLEX8 |
Two 64 bits, floating-point | pvm_pkdcplx | COMPLEX16 |
Null-terminated character string | pvm_pkstr | Not applicable |
Fortran character constant or variable | Not applicable | STRING |
To customize your PVM environment, you can use the environment variables described in Table 2-5.
Table 2-5. Environment Variables