Chapter 2. Using an Array

An Array system is an aggregation of nodes, which are IRIX servers bound together with a high-speed network and Array 3.0 software. Array users are IRIX users who enjoy the advantage of greater performance and additional services. Array users access the system with familiar commands for job control, login and password management, and remote execution.

Array 3.0 augments conventional IRIX facilities with additional services for array users and for array administrators. The extensions include support for global session management, array configuration management, batch processing, message passing, system administration, and performance visualization.

This chapter introduces the extensions for Array use, with pointers to more detailed information. (Appendix B, “Array Documentation Quick Reference,” summarizes all the pointers for quick access.) The principal topics are as follows:

Using an Array System

As an ordinary user of an Array system you are an IRIX (that is, UNIX) user, with the additional benefit of being able to run distributed sessions on multiple nodes of the Array. You access the Array from either

  • A workstation such as an SGI O2

  • An X-terminal

  • An ASCII terminal

In each case, you log in to one node of the Array in the way you would log in to any remote UNIX host. From a workstation or an X-terminal you can of course open more than one terminal window and log into more than one node.

Finding Basic Usage Information

In order to use an Array, you need the following items of information:

  • The name of the Array.

    You use this arrayname in Array Services commands.

  • The login name and password you will use on the Array

    You use these when logging in to the Array to use it.

  • The hostnames of the array nodes.

    Typically these names follow a simple pattern, often arrayname1, arrayname2, etc.

  • Any special resource-distribution or accounting rules that may apply to you or your group under a job scheduling system.

You can learn the hostnames of the array nodes if you know the array name, using the ainfo command:

ainfo -a arrayname machines

Logging In to an Array

Each node in an Array is a Silicon Graphics, Inc. multiprocessor system such as an Origin2000. Each node has an associated hostname and IP network address. Typically, you use an Array by logging in to one node directly, or by logging in remotely from another host (such as the Array console or a networked workstation). For example, from a workstation on the same network, this command would log you in to the node named hydra6:

rlogin hydra6

For details of the rlogin command, see the reference page rlogin(1).

The system administrators of your Array may choose to disallow direct node logins in order to schedule array resources. If your site is configured to disallow direct node logins, your administrators will be able to tell you how you are expected to submit work to the array—perhaps through remote execution software or batch queueing facilities.

Invoking a Program

Once you have access to an Array you can invoke programs of several classes:

  • Ordinary (sequential) applications

  • Parallel shared-memory applications within a node

  • Parallel message-passing applications within a node

  • Parallel message-passing applications distributed over multiple nodes (and possibly other servers on the same network running Array 3.0)

If you are allowed to do so, you can invoke programs explicitly from a logged-in shell command line; or you may use remote execution or a batch queueing system.

Programs that are X-windows clients must be started from an X server, either an X-terminal or a workstation running X Windows.

Some application classes may require input in the form of command line options, environment variables, or support files upon execution. For example:

  • X client applications need the DISPLAY environment variable set to specify the X server (workstation or X-terminal) where their windows will display.

    The DISPLAY variable is normally set automatically when you use rlogin from an SGI workstation.

  • A multithreaded program may require environment variables to be set describing the number of threads.

    For example, C and Fortran programs that use parallel processing directives test the MP_SET_NUMTHREADS variable.

  • MPI and PVM message-passing programs may require support files to describe how many tasks to invoke on which nodes.

Some information sources on program invocation are listed in Table 2-1.

Table 2-1. Information Sources: Invoking a Program

Topic

Book, Reference Page, or URL

Book Number

Remote login

rlogin(1)

 

Setting environment variables

environ(5), env(1)

 

Starting MPI and PVM jobs

MPI and PVM User's Guide

007-3286-xxx


Managing Local Processes

Each IRIX process has a process identifier (PID), a number that identifies that process within the node where it runs. It is important to realize that a PID is local to the node; so it is possible to have processes in different nodes using the same PID numbers.

Within a node, processes can be logically grouped in process groups. A process group is composed of a parent process together with all the processes that it creates. Each process group has a process group identifier (PGID). Like a PID, a PGID is defined locally to that node, and there is no guarantee of uniqueness across the Array.

Monitoring Processes and System Usage

You query the status of processes using the IRIX command ps. To generate a full list of all processes on a local system, use a command such as

ps -elfj

You can monitor the activity of processes using the command top (an ASCII display in a terminal window) or gr_top (displays in a graphical window).

For a global picture of the state of one node you can use gr_osview. It displays a variety of resource use values as histograms or bar-graphs in a graphical window. The command gmemusage displays memory use by all applications in the node where you start it.

Scheduling and Killing Local Processes

You can start a process at a reduced priority, so that it interferes less with other processes, using the nice command. If you use the csh shell, specify /usr/bin/nice to avoid the built-in shell command nice. To start a whole shell at low priority, use a command like

/usr/bin/nice /bin/sh

You can schedule commands to run at specific times using the at command. You can kill or stop processes using the kill command. To destroy the process with PID 13032, use a command such as

kill -KILL 13032

Summary of Process Management Commands

Table 2-2 summarizes information about local process management.

Table 2-2. Information Sources: Local Process Management

Topic

Book, Reference Page, or URL

Book Number

Process ID and process group

intro(2) — scan to the section headed “Definitions”

 

Listing and Monitoring Processes

ps(1), top(1), and gr_top(1); gr_osview(1), gmemusage(1)

 

Running programs at low priority

nice(1), batch(1)

 

Running programs at a scheduled time

at(1)

 

Terminating a process

kill(1)

 


Managing Batch Jobs with NQE

The Network Queueing Environment (NQE) is used to manage batch jobs. A batch job is a set of commands—a shell script. You submit batch job requests from a workstation to NQE, and NQE routes the jobs to an appropriate server. When a job completes, NQE returns the standard output and standard error files to the workstation. You can monitor the status of jobs, as well as delete or signal jobs.

NQE provides reliable file transfer with the File Transfer Agent (FTA), so that job scripts can transfer files to and from remote systems. If a file transfer fails for a transient reason such as a network link failing, FTA automatically requeues the transfer. This is useful in job requests because a job does not abort if the file transfer fails on the first attempt. If allowed by the site, a password is not required for the file transfer. This capability of FTA is called Network Peer-to-Peer Authorization (NPPA).

Accessing the NQE Commands

NQE is usually installed in /usr/craysoft/nqe/bin on IRIX workstations. If that directory doesn't exist, contact your system administrator to see if NQE is installed and where it is installed. If /usr/craysoft/nqe/bin does exit, add it to your PATH variable. For example:

% setenv PATH $PATH:/usr/craysoft/nqe/bin 

or

$ export PATH=$PATH:/usr/craysoft/nqe/bin

Starting NQE

The easiest way to start using NQE is through its graphical interface as implemented by the nqe command (see the nqe(1) reference page). If you run nqe on your workstation, you just start it. If you need to start nqe on an array node, with output to your workstation, you may need to set your DISPLAY variable first, as shown in the following example:

% setenv DISPLAY myworkstation:0
% nqe

Figure 2-1 shows the initial (top-level) NQE button bar window that should immediately appear.

Figure 2-1. NQE Top-level Window (Button Bar)

Figure 2-1 NQE Top-level Window (Button Bar)

Checking Job Status with NQE

To see the status of jobs running under NQE, click on the Status button. Figure 2-2 shows an example of the Status window.

Figure 2-2. NQE Status Window

Figure 2-2 NQE Status Window

The example Status Window displays two jobs. Both are executing on the server homegrown and both are running (or will run) under the user account guest.

To refresh the status display, use the Refresh button in the Status window. You may also have the display refreshed periodically by setting the refresh option in the NQE Configuration Information Window, shown in Figure 2-3. Access the NQE Configuration Information Window using the Config button on the NQE button bar.

Figure 2-3. NQE Configuration Information Window

Figure 2-3 NQE Configuration Information Window

The slide bar titled “Status Refresh Rate” (in Figure 2-3) sets the refresh rate to a value other than 0. If the rate is set to 60, the NQE status display will be refreshed every 60 seconds.

Submitting a Job with NQE

To submit a new batch job, display the Submit window (accessed using the Submit button in the NQE button bar). Figure 2-4 shows an example of the Submit window with a sample job script. To submit the job, click the Submit button.

Figure 2-4. NQE Submit Window

Figure 2-4 NQE Submit Window

A few details of the example job script shown in Figure 2-4 are of interest. The #QSUB string is an NQE directive, used to embed command line options within the script. (See the cqsub(1) or qsub(1) reference page for more information on embedded options.) The line

 #QSUB -a 8:05 

indicates to NQE that the job request should be started sometime after 8pm (20:00). The line

#QSUB -A nqearray

indicates to NQE that the job should run using the project “nqearray”. (See the projects(5) reference page for more information on project names.)

About NQE Command Line Interfaces

You can also operate NQE using a command-line interface. The NQE commands are summarized in Table 2-3. For details of the command-line interface, see the NQE User's Guide.

Table 2-3. NQE Command Line Interface Summary

Command Name

Purpose

cevent

Posts, reads, and deletes information on job-dependency events.

cqdel

Signals or deletes a job request.

cqstatl

Displays the status of job requests.

cqsub

Submits a job script.

ftua

File transfer utility, similar to FTP but with file transfer queuing and recovery (server command only).

ilb

Executes commands interactively on a host chosen by NQE.

qalter

Alters the attributes of a job request (server command only).

qchkpnt

Checkpoints a job (may only be invoked within a job script).

qdel

Signals or deletes a job request (server command only).

qlimit

Displays the job limits that apply to an NQE server (server command only).

qmsg

Writes messages to stderr, stdout, or the job log (server command only).

qping

Determines if the local NQS daemon is running (server command only).

qstat

Displays the status of job requests (server command only).

qsub

Submits a job script (server command only).

rft

File transfer command, suitable for use in job scripts (server command only).


Using Array Services Commands

When an application starts processes on more than one node, the PID and PGID are no longer adequate to manage the application. The commands of the Array Services component of Array 3.0 give you the ability to view the entire Array, and to control the processes of multinode programs.


Tip: You can use Array Services commands from any workstation connected to an Array system, for example from a workstation. You don't have to be logged in to an Array node.

This topic introduces the terms, concepts, and command options that are common to all Array Services commands. For details about any of the commands, see one of the reference pages listed in Table 2-4.

Table 2-4. Information Sources: Array Services Commands

Topic

Book, Reference Page, or URL

Book Number

Array Services Overview

array_services(5)

 

ainfo command

ainfo(1)

 

array command

use: array(1); configuration: arrayd.conf(4)

 

arshell command

arshell(1)

 

aview command

aview(1)

 

newsess command

newsess(1)

 


About Array Sessions

As noted under “Distributed Management Tools”, Array Services is composed of a daemon—a background process that is started at boot time in every node—and a set of commands such as ainfo. The commands call on the daemon process in each node to get the information they need.

One concept that is basic to Array Services is the array session, which is a term for all the processes of one application, wherever they may execute. Normally, your login shell, with the programs you start from it, constitutes an array session. A batch job is an array session; and you can create a new shell with a new array session identity.

Each session is identified by an array session handle (ASH), a number that identifies any process that is part of that session. You use the ASH to query and to control all the processes of a program, even when they are running in different nodes.

About Names of Arrays and Nodes

Each node is an IRIX server, and as such has a hostname. The hostname of a node is returned by the hostname command executed in that node:

% hostname
tokyo

The command is simple and documented in the hostname(1) reference page. The more complicated issues of hostname syntax, and of how hostnames are resolved to hardware addresses are covered in hostname(5).

An Array system as a whole has a name too. In most installations there is only a single Array, and you never need to specify which Array you mean. However, it is possible to have multiple Arrays available on a network, and you can direct Array Services commands to a specific Array.

About Authentication Keys

It is possible for the Array administrator to establish an authentication code, which is a 64-bit number, for all or some of the nodes in an array (see “Configuring Authentication Codes”). When this is done, each use of an Array Services command must specify the appropriate authentication key, as a command option, for the nodes it uses. Your system administrator will tell you if this is necessary.

Summary of Common Command Options

The commands of Array Services—ainfo, array, arshell, aview, and newsess—have a consistent set of command options. Table 2-5 is a summary of these options. Not all options are valid with all commands; and each command has unique options besides those shown. The default values of some options are set by environment variables listed in the next topic.

Table 2-5. Array Services Command Option Summary

Option

Used In

Meaning

-a array

ainfo, array, aview

Specify a particular Array when more than one is accessible.

-D

ainfo, array, arshell, aview

Send commands to other nodes directly, rather than through array daemon.

-F

ainfo, array, arshell, aview

Forward commands to other nodes through the array daemon.

-Kl number

ainfo, array, aview

Authentication key (a 64-bit number) for the local node.

-Kr number

ainfo, array, aview

Authentication key (a 64-bit number) for the remote node.

-l (letter ell)

ainfo, array

Execute in context of the destination node, not the current node.

-p port

ainfo, array, arshell, aview

Nonstandard port number of array daemon.

-s hostname

ainfo, array, aview

Specify a destination node.


Specifying a Single Node

The -l and -s options work together. The -l (letter ell for local) option restricts the scope of a command to the node where the command is executed. By default, that is the node where the command is entered. When -l is not used, the scope of a query command is all nodes of the array. The -s (server, or node, name) option directs the command to be executed on a specified node of the array. These options work together in query commands as follows:

  • To interrogate all nodes as seen by the local node, use neither option.

  • To interrogate only the local node, use only -l.

  • To interrogate all nodes as seen by a specified node, use only -s.

  • To interrogate only a particular node, use both -s and -l.

Common Environment Variables

The Array Services commands depend on environment variables to define default values for the less-common command options. These variables are summarized in Table 2-6.

Table 2-6. Array Services Environment Variables

Variable Name

Use

Default When Undefined

ARRAYD_FORWARD

When defined with a string starting with the letter y, all commands default to forwarding through the array daemon (option -F).

Commands default to direct communication (option -D).

ARRAYD_PORT

The port (socket) number monitored by the array daemon on the destination node.

The standard number of 5434, or the number given with option -p.

ARRAYD_LOCALKEY

Authentication key for the local node (option -Kl).

No authentication unless -Kl option is used.

ARRAYD_REMOTEKEY

Authentication key for the destination node (option -Kr).

No authentication unless -Kr option is used.

ARRAYD

The destination node, when not specified by the -s option.

The local node, or the node given with -s.


Interrogating the Array

Any user of an Array system can use Array Services commands to check the hardware components and the software workload of the Array. The commands needed are ainfo, array, and aview.

Learning Array Names

If your network includes more than one Array system, you can use ainfo arrays at one array node to list all the Array names that are configured, as in the following example.

homegrown% ainfo arrays
Arrays known to array services daemon
ARRAY DevArray
    IDENT 0x3381
ARRAY BigDevArray
    IDENT 0x7456
ARRAY test
    IDENT 0x655e

Array names are configured into the array database by the administrator. Different Arrays might know different sets of other Array names.

Learning Node Names

You can use ainfo machines to learn the names and some features of all nodes in the current Array, as in the following example.

homegrown 175% ainfo -b machines 
machine homegrown homegrown 5434 192.48.165.36 0
machine disarray disarray 5434 192.48.165.62 0
machine datarray datarray 5434 192.48.165.64 0
machine tokyo tokyo 5434 150.166.39.39 0

In this example, the -b option of ainfo is used to get a concise display.

Learning Node Features

You can use ainfo nodeinfo to request detailed information about one or all nodes in the array. To get information about the local node, use ainfo -l nodeinfo. However, to get information about only a particular other node, for example node tokyo, use -l and -s, as in the following example. (The example has been edited for brevity.)

homegrown 181% ainfo -s tokyo -l nodeinfo
Node information for server on machine "tokyo"
MACHINE tokyo
    VERSION  1.2
    8 PROCESSOR BOARDS
        BOARD: TYPE 15   SPEED 190
            CPU:   TYPE 9   REVISION 2.4
            FPU:   TYPE 9   REVISION 0.0
...
    16 IP INTERFACES  HOSTNAME tokyo   HOSTID 0xc01a5035
        DEVICE  et0    NETWORK    150.166.39.0    ADDRESS   150.166.39.39  UP
        DEVICE atm0    NETWORK 255.255.255.255    ADDRESS         0.0.0.0  UP
        DEVICE atm1    NETWORK 255.255.255.255    ADDRESS         0.0.0.0  UP
...
    0 GRAPHICS INTERFACES
    MEMORY
        512 MB MAIN MEMORY
        INTERLEAVE 4

If the -l option is omitted, the destination node will return information about every node that it knows.

Learning User Names and Workload

The IRIX commands who, top, and uptime are commonly used to get information about users and workload on one server. The array command offers Array-wide equivalents to these commands.

Learning User Names

To get the names of all users logged in to the whole array, use array who. To learn the names of users logged in to a particular node, for example tokyo, use -l and -s, as in the following example. (The example has been edited for brevity and security.)

homegrown 180% array -s tokyo -l who 
joecd    tokyo        frummage.eng.sgi -tcsh 
joecd    tokyo        frummage.eng.sgi -tcsh 
benf     tokyo        einstein.ued.sgi. /bin/tcsh 
yohn     tokyo        rayleigh.eng.sg vi +153 fs/procfs/prd
...

Learning Workload

Two variants of the array command return workload information. The array-wide equivalent of uptime is array uptime, as follows:

homegrown 181% array uptime 
   homegrown:  up 1 day,  7:40,  26 users,  load average: 7.21, 6.35, 4.72
    disarray:  up  2:53,  0 user,  load average: 0.00, 0.00, 0.00
    datarray:  up  5:34,  1 user,  load average: 0.00, 0.00, 0.00
       tokyo:  up 7 days,  9:11,  17 users,  load average: 0.15, 0.31, 0.29
homegrown 182% array -l -s tokyo uptime 
       tokyo:  up 7 days,  9:11,  17 users,  load average: 0.12, 0.30, 0.28

The command array top lists the processes that are currently using the most CPU time, with their ASH values, as in the following example.

homegrown 183% array top 
        ASH        Host           PID User       %CPU Command
----------------------------------------------------------------
0x1111ffff00000000 homegrown        5 root       1.20 vfs_sync
0x1111ffff000001e9 homegrown     1327 guest      1.19 atop
0x1111ffff000001e9 tokyo        19816 guest      0.73 atop
0x1111ffff000001e9 disarray      1106 guest      0.47 atop
0x1111ffff000001e9 datarray      1423 guest      0.42 atop
0x1111ffff00000000 homegrown       20 root       0.41 ShareII
0x1111ffff000000c0 homegrown    29683 kchang     0.37 ld
0x1111ffff0000001e homegrown     1324 root       0.17 arrayd
0x1111ffff00000000 homegrown      229 root       0.14 routed
0x1111ffff00000000 homegrown       19 root       0.09 pdflush
0x1111ffff000001e9 disarray      1105 guest      0.02 atopm

The -l and -s options can be used to select data about a single node, as usual.

Browsing With ArrayView

The ArrayView, or aview, command is a graphical window on the status of an array. You can start it with the command aview and it displays a window similar to the one shown in Figure 2-5. The top window shows one line per node. There is a window for each node, headed by the node name and its hardware configuration. Each window contains a snapshot of the busiest processes in that node.

Figure 2-5. Typical Display from ArrayView (aview) Command

Figure 2-5 Typical Display from ArrayView (aview) Command

Managing Distributed Processes

Using commands from the Array Services component of Array 3.0, you create and manage processes that are distributed across multiple nodes of the Array system.

About Array Session Handles (ASH)

In an Array system you can start a program whose processes are in more than one node. In order to name such collections of processes, Array 3.0 software assigns each process to an array session handle (ASH).

An ASH is a number that is unique across the entire array (unlike a PID or PGID). An ASH is the same for every process that is part of a single array session—no matter which node the process runs in. You display and use ASH values with Array Services commands. Each time you log in to an Array node, your shell is given an ASH, which is used by all the processes you start from that shell.

The command ainfo ash returns the ASH of the current process on the local node, which is simply the ASH of the ainfo command itself.

homegrown 178% ainfo ash 
Array session handle of process 10068: 0x1111ffff000002c1
homegrown 179% ainfo ash 
Array session handle of process 10069: 0x1111ffff000002c1

In the preceding example, each instance of the ainfo command was a new process: first PID 10068, then PID 10069. However, the ASH is the same in both cases. This illustrates a very important rule: every process inherits its parent's ASH. In this case, each instance of array was forked by the command shell, and the ASH value shown is that of the shell, inherited by the child process.

You can create a new global ASH with the command ainfo newash, as follows:

homegrown 175% ainfo newash 
Allocating new global ASH
0x11110000308b2f7c

This feature has little use at present. There is no existing command that can change its ASH, so you cannot assign the new ASH to another command. It is possible to write a program that takes an ASH from a command-line option and uses the Array Services function setash() to change to that ASH (however such a program must be privileged). No such program is distributed with Array 3.0 (but see “Managing Array Service Handles”).

Listing Processes and ASH Values

The command array ps returns a summary of all processes running on all nodes in an array. The display shows the ASH, the node, the PID, the associated username, the accumulated CPU time, and the command string.

To list all the processes on a particular node, use the -l and -s options. To list processes associated with a particular ASH, or a particular username, pipe the returned values through grep, as in the following example. (The display has been edited to save space.)

homegrown 182% array -l -s tokyo ps | fgrep wombat 
0x261cffff0000054c        tokyo 19007   wombat    0:00 -csh
0x261cffff0000054a        tokyo 17940   wombat    0:00 csh -c (setenv...
0x261cffff0000054c        tokyo 18941   wombat    0:00 csh -c (setenv...
0x261cffff0000054a        tokyo 17957   wombat    0:44 xem -geometry 84x42
0x261cffff0000054a        tokyo 17938   wombat    0:00 rshd
0x261cffff0000054a        tokyo 18022   wombat    0:00 /bin/csh -i
0x261cffff0000054a        tokyo 17980   wombat    0:03 /usr/gnu/lib/ema...
0x261cffff0000054c        tokyo 18928   wombat    0:00 rshd

When you have Performance Co-Pilot installed (see “Performance Co-Pilot”) you have two additional commands for listing processes: ashtop displays a continuously updated list of the processes that are executing under a specified ASH (see the ashtop(1) reference page, if installed). The arraytop command produces a similar display for the entire array (see the arraytop(1) reference page, if installed). Both of these, and additional features of Performance Co-Pilot, are described in the pcp_array(5) reference page.

Controlling Processes

The arshell command lets you start an arbitrary program on a single other node. The array command gives you the ability to suspend, resume, or kill all processes associated with a specified ASH.

Using arshell

The arshell command is an Array Services extension of the familiar rsh command; it executes a single IRIX command on a specified Array node. The difference from rsh is that the remote shell executes under the same ASH as the invoking shell (this is not true of simple rsh). The following example demonstrates the difference.

homegrown 179% ainfo ash
Array session handle of process 8506: 0x1111ffff00000425
homegrown 180% rsh guest@tokyo ainfo ash
Array session handle of process 13113: 0x261cffff0000145e
homegrown 181% arshell guest@tokyo ainfo ash
Array session handle of process 13119: 0x1111ffff00000425

You can use arshell to start a collection of unrelated programs in multiple nodes under a single ASH; then you can use the commands described under “Managing Session Processes” to stop, resume, or kill them.

Both MPI and PVM use arshell to start up distributed processes.


Tip: The shell is a process under its own ASH. If you use the array command to stop or kill all processes started from a shell, you will stop or kill the shell also. In order to create a group of programs under a single ASH that can be killed safely, proceed as follows:


  1. Create a nested shell with a new ASH using newsess. Note the ASH value.

  2. Within the new shell, start one or more programs using arshell.

  3. Exit the nested shell.

Now you are back to the original shell. You know the ASH of all programs started from the nested shell. You can safely kill all jobs that have that ASH because the current shell is not affected.

About the Distributed Example

The programs launched with arshell are not coordinated (they could of course be written to communicate with each other, for example using sockets), and you must start each program individually.

The array command is designed to permit the simultaneous launch of programs on all nodes with a single command. However, array can only launch programs that have been configured into it, in the Array Services configuration file. (The creation and management of this file is discussed under “About Array Configuration”.)

In order to demonstrate process management in a simple way from the command line, the following command was inserted into the configuration file /usr/lib/array/arrayd.conf:

#
# Local commands
#
command spin                    # Do nothing on multiple machines
        invoke /usr/lib/array/spin
        user    %USER
        group   %GROUP
        options nowait

The invoked command, /usr/lib/array/spin, is a shell script that does nothing in a loop, as follows:

#!/bin/sh
# Go into a tight loop
#
interrupted() {
        echo "spin has been interrupted - goodbye"
        exit 0
}
trap interrupted 1 2
while [ ! -f /tmp/spin.stop ]; do
        sleep 5
done
echo "spin has been stopped - goodbye"
exit 1

With this preparation, the command array spin starts a process executing that script on every processor in the array. Alternatively, array -l -s nodename spin would start a process on one specific node.

Managing Session Processes

The following command sequence creates and then kills a spin process in every node. The first step creates a new session with its own ASH. This is so that later, array kill can be used without killing the interactive shell.

homegrown 175% ainfo ash
Array session handle of process 8912: 0x1111ffff0000032d
homegrown 176% newsess
homegrown 175% ainfo ash
Array session handle of process 8941: 0x11110000308b2fa6

In the new session with ASH 0x11110000308b2fa6, the command array spin starts the /usr/lib/array/spin script on every node. In this test array, there were only two nodes on this day, homegrown and tokyo.

homegrown 176% array spin

After exiting back to the original shell, the command array ps is used to search for all processes that have the ASH 0x11110000308b2fa6.

homegrown 177% exit
homegrown 178% homegrown 177% 
homegrown 177% ainfo ash
Array session handle of process 9257: 0x1111ffff0000032d
homegrown 179% array ps | fgrep 0x11110000308b2fa6
0x11110000308b2fa6  homegrown  9033   guest   0:00 /bin/sh /usr/lib/array/spin
0x11110000308b2fa6  homegrown  9618  guest  0:00 sleep 5
0x11110000308b2fa6        tokyo 26021  guest   0:00 /bin/sh /usr/lib/array/spin
0x11110000308b2fa6      tokyo 26072  guest  0:00 sleep 5
0x1111ffff0000032d  homegrown  9642  guest  0:00 fgrep 0x11110000308b2fa6

There are two processes related to the spin script on each node. The next command kills them all.

homegrown 180% array kill 0x11110000308b2fa6
homegrown 181% array ps | fgrep 0x11110000308b2fa6
0x1111ffff0000032d  homegrown 10030  guest  0:00 fgrep 0x11110000308b2fa6

The command array suspend 0x11110000308b2fa6 would suspend the processes instead (however, it is hard to demonstrate that a sleep command has been suspended).