Chapter 1. Array System Components

An Array system is a complex system with layers of hardware and software components. This chapter orients you to these components, working from the bottom up:

Each section contains a table of information sources—online and printed books, reference pages, and WWW sites—related to the topic of that section. All such pointers are reproduced in Appendix B, “Array Documentation Quick Reference.”

Array Components

The performance and power of an Array system are the result of linking several symmetric multiprocessor (SMP) computers by a high-performance interconnect, and managing the combination with customized software and bundled application and administrative software.

Array Hardware Components

An Array comprises the following hardware:

  • From two to eight nodes, each of which is a Silicon Graphics, Inc. computer, typically a multiprocessor such as:

    • Origin2000 or Origin200

    • Challenge®, POWER Challenge, or POWER Challenge R10000

    • Onyx2, Onyx or POWER Onyx

  • An interconnecting network, typically one, and as many as six, bidirectional HIPPI network interfaces per node and a HIPPI crossbar switch.

  • One IRISconsole as an administration console.

    An IRISconsole is an O2 or Indy workstation augmented with an IRISconsole serial port multiplexer.

A complete Array system is shown schematically in Figure 1-1.

Figure 1-1. Array System Schematic

Figure 1-1 Array System Schematic

Array Software Components

The Array 3.0 software binds the Array system hardware into a supercomputer that can be programmed and administered as one system. An Array system using Array 3.0 software is based on the following major components:

Array diagnostics

Diagnostics used by Silicon Graphics, Inc. system engineers to verify installation and isolate faults.

IRIX 6.2 and IRIX 6.4

Multiprocessor operating systems including NFS™ version 3 network support.

XFS filesystem

High-performance, ultra-high capacity, journaled filesystem manages large RAID and disk farms.

HIPPI software

Support for high-performance network link, including an SGI-proprietary fast path for minimum overhead on short messages.

Array Services

Integrated administration tools.


Permits centralized administration of all nodes in the Array.

MPI (Message-Passing Interface) 3.0 and XMPI

Distributed programming environment with support for HIPPI bypass, and visual monitor.

PVM (Parallel Virtual Machine) 1.2 and XPVM

Popular distributed programming environment and visual monitor.

Many optional software packages are available from Silicon Graphics, Inc. to extend Array 3.0, including:

Network Queuing Environment (NQE)

Load-balancing and scheduling facility that lets users submit, monitor, and control workacross machines in a network.

Performance Co-Pilot (PCP)

Performance visualization facility.

ProDev WorkShop

Suite of graphical tools for developing parallel programs.

A variety of software packages from third parties also are available, including:


Resource-centric Fair Share scheduler from Softway (systems using IRIX 6.2 only).


Accounting software by Instrumental, Inc.


Batch-scheduling facility by GENIAS Software.

LSF (Load Sharing Facility)

Batch scheduling facility by Platform Computing.

High Performance Fortran (HPF)

Compilers available from the Portland Group (PGI) and Applied Parallel Research (APR)

Most of these components are described at more length in following topics.

Array Architecture

An Array system is a distributed-memory multiprocessor, scalable to several hundred individual MIPS processors in as many as eight nodes, yielding a peak aggregate computing capacity of many GFLOPS. The aggregation of nodes is connected by an industry-standard, 1.0 Gbit per second HIPPI network.

This section examines the components of an Array system in detail.

Array Nodes

The basic computational building block of an Array is a Silicon Graphics, Inc. multiprocessor. Any system running IRIX 6.2 can participate as a node in Array 3.0, but normally a node is a multiprocessor system. Depending on the type of Array and customer's choices, a node can be any of the systems listed in Table 1-1.

Table 1-1. Array Node System Selection


Processor Complement



2-128 R10000



2 R10000



2-16 R10000

InfiniteReality or RealityMonster


2-32 R4400



2-36 R10000

Extreme Visualization Console

POWER Challenge

2-18 R8000

Extreme Visualization Console

POWER Challenge GR

2-24 R10000

Extreme Visualization Console, or InfiniteReality, or Reality Engine2


1-12 R8000

1-3 RealityEngine2

Onyx 10000

1-24 R10000

1-3 InfiniteReality

Table 1-2 lists information sources for the different types of systems.

Table 1-2. Information Sources: Array Component Systems


Book or URL

Book Number

All SGI Servers are

Origin2000 and Origin200 x.html


Onyx2 and RealityMonster s/products/index.html



POWER CHALLENGE XL Rackmount Owner's Guide



POWER Onyx and Onyx Rackmount Owner's Guide



POWER CHALLENGE XL Rackmount Owner's Guide


RealityEngine2 and InfiniteReality


Extreme Visualization Console

POWER CHALLENGE XL Rackmount Owner's Guide


Hybrid Array

An Array that includes both Origin2000/Onyx2 systems and Challenge/Onyx systems is called a hybrid array. Previous versions of Array software supported only uniform Arrays composed of Challenge and Onyx systems. Array 3.0 software supports uniform arrays of Origin2000/Onyx2 systems, uniform arrays of Challenge/Onyx systems, and hybrid arrays.

The HIPPI Interconnect

Array nodes are normally connected by a high-performance, dual-channel HIPPI network. Each node is equipped with one or more bidirectional HIPPI interfaces. Each interface provides 100 MB per second of data bandwidth in either direction.

The HIPPI interfaces are connected via a high-performance HIPPI crossbar switch (optional in a two-node Array). The HIPPI switch is nonblocking, with sub-microsecond connection delays. The network appears to be fully connected and contention occurs only when two sources send data to the same destination at the same time.

IRIX 6.2 or IRIX 6.4 and the Array 3.0 software provide protocol layers and APIs to access the HIPPI network, including direct physical layer, HIPPI framing protocol, and TCP/IP. The HIPPI support includes special bypass capabilities to expedite transmission of short messages. The bypass capability is transparent to the applications using it.

Table 1-3 lists information sources on HIPPI and the HIPPI crossbar (which is produced by a third party, Essential Communications, Inc.).

Table 1-3. Information Sources: HIPPI Interconnect


Book or URL

Book Number

HIPPI interface

IRIS HIPPI Administrator's Guide

IRIS HIPPI API Programmer's Guide



HIPPI Crossbar Switch

EPS-1 User's Guide


Visualization and Interactive Supercomputing

Array nodes can be configured with hardware graphics support, to provide two and three-dimensional visualization performance commensurate with the available compute power. The available graphics options are listed in Table 1-1. Complex supercomputing visualization architectures can be built by aggregating compute and graphics nodes, as illustrated in Figure 1-2.

Figure 1-2. Advanced Visualization With Arrays

Figure 1-2 Advanced Visualization With Arrays

Centralized Console Management

An IRISconsole serves as a single, centralized administrative console for Array administration and maintenance. The IRISconsole consists of an O2 or Indy workstation, an IRISconsole multiplexer box, and the IRISconsole graphical cluster management software. From the IRISconsole, administrators can control, configure, monitor, and maintain the individual Array nodes.

The O2 or Indy workstation serves as a virtual console for each node. The workstation is connected to the multiplexer via a SCSI interface. The multiplexer in turn connects to the Remote System Control port of each node. Commands from the console workstation are routed to the appropriate node, and results from the nodes are routed back.

The IRISconsole graphical user interface provides a convenient graphic representation of the array. Sets of nodes can be selected and operated upon. You can open a command window directly to any node. You can use the IRISconsole graphical interface to

  • Dynamically add and remove nodes in the array

  • Display console messages or enter console commands to any node

  • Interrupt, reset, or power-cycle any node

  • Display and record real-time graphs of hardware operating statistics, including voltage, temperature, and cooling status

  • Enable monitors and alarms for conditions such as excessive temperature

  • View activity logs and other system reports

For more about the features of IRISconsole and illustrations of its use, see “Using the IRISconsole Workstation”. Table 1-4 lists other information sources for IRISconsole and its hardware.

Table 1-4. Information Sources: IRISconsole


Book or URL

Book Number


IRISconsole Administrator's Guide /IRISconsole.html



IRISconsole Installation Guide
Indy Workstation Owner's Guide


Distributed Management Tools

Array 3.0 makes an Array manageable by providing support for process execution, program development, performance instrumentation, and system administration.

This section introduces many of the bundled and third-party tools in detail.

Array Services

Array Services includes administrator commands, libraries, daemons and kernel extensions that support the execution of programs across an Array.

A central concept in Array Services is the array session handle (ASH), a number that is used to logically group related processes that may be distributed across multiple systems. The ASH creates a global process namespace across the Array, facilitating accounting and administration.

Array Services also provides an array configuration database, listing the nodes comprising an array. Array inventory inquiry functions provide a centralized, canonical view of the configuration of each node. Other array utilities let the administrator query and manipulate distributed array applications.

The Array Services package comprises the following primary components:

array daemon

These daemon processes, one in each node, cooperate to allocate ASH values and maintain information about node configuration and the relation of process IDs to ASHes.

array configuration database

One copy at each node, this file describes the Array configuration for use by array daemons and user programs.

ainfo command

Lets the user or administrator query the Array configuration database and information about ASH values and processes.

array command

Executes a specified command on one or more nodes. Commands are predefined by the administrator in the configuration database.

arshell command

Starts an IRIX command remotely on a different node using the current ASH value.

aview command

Displays a multiwindow, graphical display of each node's status.

libarray library

Library of functions that allow user programs to call on the services of array daemons and the array configuration database.

The use of the ainfo, array, arshell, and aview commands is covered in Chapter 2, “Using an Array.” The use of the libarray library is covered in Chapter 4, “Performance-Driven Programming in Array 3.0.”

Performance Co-Pilot

Performance Co-Pilot (PCP) is a Silicon Graphics product for monitoring, visualizing, and managing systems performance.

PCP has a distributed client-server architecture, with performance data collected from a set of servers and displayed on visualization clients. Performance data can be obtained from multiple sources, including the IRIX kernel and user applications. With support for low-intrusion performance data collection, reduction, and analysis, PCP permits a variety of metrics to be captured, correlated, reduced, recorded, and rendered.

PCP has been customized for Array systems to provide visualization of system-level and job-level statistics across the array. An array user can view a variety of relevant performance metrics on the array via the following utilities:


Visualize CPU utilization of any node.


Visualize disk I/O rates on any node.


Visualize NSF statistics on any node.


Plot performance metrics versus time for any node.


Visualize CPU utilization across an array for tasks belonging to a particular array session handle.


Visualize aggregate Array performance.


List of top CPU-using processes under a given ASH.


List of top CPU-using processes in the array.

For more information about Performance Copilot, see The Performance Co-Pilot User's and Administrator's Guide (007-2614-xxx).

SHARE II (Fair Share) Scheduling

SHARE II, a “Fair Share” scheduler, allows an organization to create its own resource allocation policy based on its assessment of how resource usage should be fairly distributed to individuals or arbitrarily grouped users. SHARE II is available only for Arrays that use IRIX 6.2; it is not available for IRIX 6.4.

With SHARE II, users are grouped into a system-wide resource allocation and charging hierarchy. The hierarchy can represent projects, divisions, or arbitrary sets of users. Within this hierarchy, resource usage policy can be varied or delegated at any level according to organizational priorities.

Users can be limited in consumption of renewable resources (such as printer pages) and fixed resources (such as instantaneous memory use). Other limits are imposed during periods of scarcity (for example, CPU run time during periods of contention). Thus, SHARE II provides a fair share of the system resources during high-load periods without overcommitment, wasteful static reservations, or expensive administrator intervention.

Accounting With PerfAcct

PerfAcct, a third-party software product, gathers system accounting data from all nodes in a central location, where it is summarized and used to generate usage reports or billing. PerfAcct exploits IRIX extended-session accounting data to provide true job accounting. Job and project accounting permits usage tracking and billing by external or internal contracts, departments, tasks, and projects.

PerfAcct features low-overhead data collection on the nodes being monitored. To minimize system load on the monitored systems, archiving and summarization can be put on a remote low-cost workstation. PerfAcct also includes aggregate accounting statistics, as well as graphical user interface tools for measuring dynamic system load.

Supporting Documentation

Table 1-5 lists information sources for the management tool products.

Table 1-5. Information Sources: Management Tools


Book, Cross-Reference, or URL

Book Number

Array Services

Chapter 2, “Using an Array”



Performance Co-Pilot data sheet CoPilot/CoPilot.html


Performance Co-Pilot

The Performance Co-Pilot User's and Administrator's Guide

Performance Co-Pilot for Informix-7 User's Guide






Share II for IRIX Administrator's Guide


Job Execution Facilities

An Array system can be used as an interactive system for real-time experimentation, as a coupled multiprocessor for grand-challenge class applications, and as a throughput compute engine for high-efficiency batch execution. This section introduces the job scheduling features.

Interactive Processing

Users can log in to a node to execute jobs interactively using normal IRIX job-control facilities. Interactive jobs can be command-line based, or can be X Windows applications that execute on the node but display on the user's workstation.

Jobs started interactively can be sequential programs, or multi-threaded programs executing within a node, or distributed-memory parallel applications executing across several nodes. Distributed programs using MPI or PVM can be started and monitored using the graphical monitors XMPI and XPVM; these display job status graphically on the user's workstation screen. Table 1-6 lists information sources on interactive processing.

Table 1-6. Information Sources: Interactive Processing


Book, Cross-Reference, or URL

Book Number

Logging in to a node

Chapter 2, “Using an Array”



MPI and PVM User's Guide



Batch Processing

Batch processing allows off-line job scheduling. Batch processing is appropriate for production environments, high job-load environments, and situations where program results are not required immediately.

When an Array system is used for batch scheduling, users submit jobs to batch queues, which contain ordered sets of waiting jobs. When sufficient compute resources become available, and subject to tunable scheduling constraints, jobs are extracted from the batch queues and scheduled on the nodes. Job results and termination status are recorded in files or are electronically mailed to the user. See Figure 1-3.

Figure 1-3. Batch Processing on an Array System

Figure 1-3 Batch Processing on an Array System

Several popular batch facilities are compatible with Array 3.0, including the Network Queuing Environment (NQE) from Silicon Graphics, Inc.; the Codine Job-Management System from Genias Software, Inc.; and Load Sharing Facility (LSF) from Platform Computing, Inc.

NQE consists of the following components that provide a seamless environment for users of the Array:

  • The NQE graphical interface allows users to submit batch requests to a central database, and to monitor and control each request.

  • The Network Load Balancer (NLB) routes jobs to available nodes according to their current workload.

  • The NQE scheduler determines when and on which node each request is to run.

  • The File Transfer Agent (FTA) provides synchronous and asynchronous transfer of files, including automatic retry when a network link fails.

IRIX Checkpoint and Restart (CPR) facility allows you to save the status of long-running jobs and restart them easily.

Table 1-7 lists information sources on these products.

Table 1-7. Information Sources: Batch Scheduling Products


Book or URL

Book Number

IRIX Checkpoint and Restart (CPR)

IRIX Checkpoint and Restart Operation Guide


Network Queuing Environment (NQE) technical papers ex.html (pointers to technical papers) nqe/nqe30.html (illustrated overview)

NQE User's Guide

NQE Administrator's Guide

SG-2148 3.2

SG-2150 3.2

Load Sharing Facility (LSF)




Compilation, Development, and Execution Facilities

Array 3.0 is complemented by development tools from Silicon Graphics, Inc. and other companies to simplify creation of parallel applications using both shared-memory and distributed-memory models. This section summarizes these tools. Additional discussion of software development appears in Chapter 4, “Performance-Driven Programming in Array 3.0.”

Optimizing and Parallelizing Compilers

The MIPSpro compilers are the third-generation family of optimizing and parallelizing compilers from Silicon Graphics, offering comprehensive support for parallel application development.

Exploiting aggressive dependency analysis, the compilers perform automatic program restructuring, software pipelining, and parallelization. The compilers also provide a comprehensive set of comment directives that enable users to assist the compiler in the parallelization process.

Silicon Graphics, Inc. offers MIPSpro compilers for Fortran 77, Fortran 90, and C; as well as compilers for Ada 95, C++, assembly language, and Pascal. For detailed information about each compiler see the sources listed in Table 1-8.

Table 1-8. Information Sources: Compilers from SGI


Book or URL

Book Number

MIPSpro compiler features and use

MIPS Compiling and Performance Tuning Guide


C language

C Language Reference Manual


MIPSpro Fortran 77

MIPSpro Fortran 77 Programmer's Guide

MIPSpro Fortran 77 Language Reference Manual



MIPSpro Fortran 90

MIPSpro Fortran 90 Programmer's Guide


Automatic parallelization of C and Fortran code

MIPSpro Power Fortran 77 Programmer's Guide

MIPSpro Power Fortran 90 Programmer's Guide

IRIS Power C User's Guide




C++ language

C++ Programmers Guide


Assembly Language

MIPSPro Assembly Language Programmer's Guide


Ada95 (GNU Ada Translator, GNAT)

GNAT User's Guide



Pascal Programming Guide


High Performance Fortran

High Performance Fortran (HPF) is an extended version of Fortran 90 that is emerging as a standard for programming of shared- and distributed-memory systems in the data-parallel style. HPF incorporates a data-mapping model and associated directives that allow a programmer to specify how data is logically distributed in an application. An HPF compiler interprets these directives to generate code that minimizes interprocessor communication in distributed systems and maximizes data reuse in all types of systems.

HPF compilers are available for Array systems from the Portland Group, Inc. and Applied Parallel Research, Inc. Table 1-9 lists information sources for these products.

Table 1-9. Information Sources: High Performance Fortran


Book or URL

Book Number

High Performance Fortran texbook

The High Performance Fortran Handbook, Koelbel, Loveman, Schreiber, Steele Jr., and Zosel; MIT Press, 1994 ( )

ISBN 0-262-61094-9

High Performance Fortran forum


Portland Group, Inc.


Applied Parallel Research


Numerical Libraries

The compilers are complemented by CHALLENGEcomplib, a comprehensive, optimized collection of scientific and math subroutine libraries popular in scientific computing. The library consists of two subcomponents: SGIMATH and SLATEC.

SGIMATH is hand-tuned, optimized, and parallelized, providing high-performance, portable implementations of the following popular numerical facilities:

  • Basic Linear Algebra Subprograms (BLAS), levels 1, 2, and 3

  • 1D, 2D, and 3D Fast Fourier Transforms (FFT)

  • convolutions and correlation routines


  • SCIPORT (portable version of SCILIB)

  • SOLVERS: pcg sparse solvers, direct sparse solvers, symmetric iterative solvers, and solvers for special linear systems

A source for a more detailed overview of CHALLENGEcomplib is listed in Table 1-10. Most of the functions within the library are documented in reference pages that install with the product.

Table 1-10. Information Sources: CHALLENGEcomplib


Book or URL

Book Number

CHALLENGEcomplib overview ch_complib.html


IRIX 6.2 and 6.4

The primary process control services of the Array are provided by the IRIX operating system, a symmetric multiprocessing operating system based on UNIX SVR4 with compatibility for BSD.

IRIX version 6.2 is required for Array 3.0 on Challenge/Onyx systems, and IRIX 6.4 on Origin systems. This version provides fast, flexible support for shared-memory interprocess communication, high-performance I/O, and performance-centric scheduling. Within a node, related processes are gang-scheduled to prevent one process from wasting time by spinning on locks held by blocked peers. Process placement decisions incorporate cache affinity heuristics, which minimizes multiprogramming-induced cache thrashing by tending to keep particular processes on the same processor.

Real-time processing can be supported with the REACT facilities, including nondegrading priorities, deadline scheduling, and reliably bounded kernel latencies. Hooks to support optional SHARE II Fair Share Scheduling checkpoint-restart facilities are also supported.

IRIX supports a variety of system functions to allow shared memory interprocess communication (IPC) between processes within one node. SVR4-compatible library functions for semaphores, message queues, and shared memory are supported. High-performance IRIX-unique facilities for shared memory, semaphores, and mutex locks are included. POSIX-compatible library functions for semaphores, message queues, and shared memory are integrated into IRIX 6.4 (available as a patch set for IRIX 6.2).

Overview sources on IRIX and on the REACT real-time programming extensions are listed in Table 1-11.

Table 1-11. Information Sources: IRIX and REACT


Book or URL

Book Number

IRIX 6.2 Data Sheet IRIX62DS.html


IRIX 6.2 Specifications IRIX62specs.html


REACT/pro and real-time programming


IRIX IPC facilities

Topics In IRIX Programming


Performance and Debugging Tools

Silicon Graphics includes a powerful set of parallel debugging, profiling, and visualization tools as part of the Developer Magic application development suite. Systemic performance visualization is provided by the Performance Co-Pilot facility, and array extensions.

In addition to these, IRIX 6.2 contains the interactive debugger dbx and profiling tools pixie and prof. Information sources on developer tools are listed in Table 1-12.

Table 1-12. Information Sources: Performance and Debugging Tools


Book or URL

Book Number

Developer Magic overview


Developer Magic

Developer Magic: ProDev WorkShop Overview


Performance Co-Pilot data sheet ge/CoPilot/CoPilot.html


Performance Co-Pilot

The Performance Co-Pilot User's and Administrator's Guide

Performance Co-Pilot for Informix-7 User's Guide



dbx, prof, pixie

dbx User's Guide

MIPS Compiling and Performance Tuning Guide



Message-Passing Protocols

Parallel applications using IPC facilities execute within a single node. However, you can create parallel applications that distribute across one or more nodes using a different model of parallel computation, the message-passing model.

In the message-passing model, processes communicate by exchanging “messages” of application data. The supporting library code chooses the fastest available means to pass the messages—through shared memory IPC within a node, across the HIPPI interconnect between nodes when available, or via TCP/IP.

Array 3.0 supports multiple message-passing protocols that are bundled in the separate product, the Message-Passing Toolkit. This single product contains implementations of three protocols: Two well-known, standardized, message-passing libraries, the Message Passing Interface (MPI) and Parallel Virtual Machine (PVM), and the Cray-designed SHMEM protocol.

MPI is the favored message-passing facility under Array 3.0. The MPI library exploits low-overhead, shared-memory transfers whenever possible. Messages sent between processes residing on different nodes use the HIPPI network; but the MPI library is aware of, and uses, the proprietary HIPPI bypass in Array 3.0 to get higher bandwidth when possible.

While Array 3.0 supports MPI as its native message-passing model, it also supports PVM and SHMEM for portability. The PVM library support has been optimized to exploit shared-memory transfers within a single node, but it does not take advantage of HIPPI bypass, and thus may not achieve the inter-node bandwidth of MPI.

Table 1-13 lists information sources about parallel and distributed programming. This subject is also explored in more detail in Chapter 4, “Performance-Driven Programming in Array 3.0.”

Table 1-13. Information Sources: Parallel and Distributed Programming


Book or URL

Book Number

Parallel Programming Models Compared

Topics In IRIX Programming


Message Passing Toolkit (MPT) in general


MPI Overview



MPI References

Using MPI, Gropp, Lusk, and Skjellum, MIT Press 1995 ( )

MPI, The Complete Reference, Snir, Otto, Huss-Lederman, Walker, and Dongarra, MIT Press 1995

Using MPI (in IRIX Insight library)

ISBN 0-262-69184-1

ISBN 0-262-57104-8


MPI Standard


PVM Overview



PVM Reference

PVM: Parallel Virtual Machine, Geist, Beguelin, Dongarra, Weicheng Jiang, Manchek, and Sunderam, MIT Press 1994

ISBN 0-262-57108-0

PVM Home Page


Porting PVM to MPI

Topics In IRIX Programming