Chapter 4. IRIX Platform

CXFS supports a client-only node running the IRIX operating system on supported SGI machines.

This chapter discusses the following:

For information about running the CXFS graphical user interface (GUI), system reset, and system tunable parameters, see CXFS Administration Guide for SGI InfiniteStorage.

CXFS on IRIX

This section contains the following information about CXFS on IRIX:

Requirements for IRIX

In addition to the items listed in “Requirements” in Chapter 1, using an IRIX node to support CXFS requires the following:

  • One of the following operating systems:

    • IRIX 6.5.28

    • IRIX 6.5.29

    • IRIX 6.5.30

  • SGI server hardware:

    • SGI Origin 200 server

    • SGI Origin 2000 series

    • SGI Origin 300 server

    • SGI Origin 350 server

    • SGI Origin 3000 series

    • Silicon Graphics Onyx2 system

    • Silicon Graphics Fuel visual workstation

    • Silicon Graphics Octane system

    • Silicon Graphics Octane2 system

    • Silicon Graphics Tezro

  • The following Fibre Channel HBAs:

    • LSI Logic models:

      LSI7104XP-LC
      LSI7204XP-LC

    • QLogic models:

      QLA2200
      QLA2200F
      QLA2310
      QLA2310F
      QLA2342
      QLA2344

For additional information, see the CXFS IRIX release notes.

CXFS Commands on IRIX

The following commands are shipped as part of the CXFS IRIX package:

/usr/cxfs_cluster/bin/cxfs_client
/usr/cxfs_cluster/bin/cxfs_info
/usr/cxfs_cluster/bin/cxfsdump
/usr/sbin/grioadmin
/usr/sbin/griomon
/usr/sbin/grioqos
/usr/cxfs_cluster/bin/xvm

The cxfs_client and xvm commands are needed to include a client-only node in a CXFS cluster. The cxfs_info command reports the current status of this node in the CXFS cluster.

For more information, see the man pages. For additional information about the GRIO commands, see “Guaranteed-Rate I/O (GRIO) and CXFS” in Chapter 1 and “GRIO on IRIX”.

Log Files on IRIX

The cxfs_client command creates a /var/log/cxfs_client log file. To rotate this log file, use the -z option in the /etc/config/cxfs_client.options file; see the cxfs_client man page for details.

For information about the log files created on server-capable administration nodes, see the CXFS Administration Guide for SGI InfiniteStorage.

CXFS Mount Scripts on IRIX

IRIX supports the CXFS mount scripts. See “CXFS Mount Scripts” in Chapter 1 and the CXFS Administration Guide for SGI InfiniteStorage.

Limitations and Considerations on IRIX

Note the following:

  • The inode monitor device (imon) is not supported on CXFS filesystems.

  • Do not use the IRIX fsr command; the bulkstat system call has been disabled for CXFS clients.

See also Appendix B, “Filesystem and Logical Unit Specifications”.

Access Control Lists and IRIX

All CXFS files have UNIX mode bits (read, write, and execute) and optionally an access control list (ACL). For more information, see the chmod and setfacl man pages.

Preinstallation Steps for IRIX

This section discusses the following:

Adding a Private Network for IRIX

The following procedure provides an overview of the steps required to add a private network.


Note: A private network is required for use with CXFS.


You may skip some steps, depending upon the starting conditions at your site.

  1. Edit the /etc/hosts file so that it contains entries for every node in the cluster and their private interfaces as well.

    The /etc/hosts file has the following format, where primary_hostname can be the simple hostname or the fully qualified domain name:

    IP_address    primary_hostname    aliases

    You should be consistent when using fully qualified domain names in the /etc/hosts file. If you use fully qualified domain names on a particular node, then all of the nodes in the cluster should use the fully qualified name of that node when defining the IP/hostname information for that node in their /etc/hosts file.

    The decision to use fully qualified domain names is usually a matter of how the clients are going to resolve names for their client/server programs (such as NFS), how their default resolution is done, and so on.

    Even if you are using the domain name service (DNS) or the network information service (NIS), you must add every IP address and hostname for the nodes to /etc/hosts on all nodes. For example:

    190.0.2.1 server1-example.com server1
    190.0.2.3 stocks
    190.0.3.1 priv-server1
    190.0.2.2 server2-example.com server2
    190.0.2.4 bonds
    190.0.3.2 priv-server2

    You should then add all of these IP addresses to /etc/hosts on the other nodes in the cluster.

    For more information, see the hosts(5) and resolve.conf(5) man pages.


    Note: Exclusive use of NIS or DNS for IP address lookup for the nodes will reduce availability in situations where the NIS or DNS service becomes unreliable.


  2. Edit the /etc/nsswitch.conf file so that local files are accessed before either NIS or DNS. That is, the hosts line in /etc/nsswitch.conf must list files first.

    For example:

    hosts:      files nis dns

    (The order of nis and dns is not significant to CXFS, but files must be first.)

  3. Configure your private interface according to the instructions in IRIX Admin: Networking and Mail. To verify that the private interface is operational, use the ifconfig -a command. For example:

    irix# ifconfig -a
    
    eth0      Link encap:Ethernet  HWaddr 00:50:81:A4:75:6A
              inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:13782788 errors:0 dropped:0 overruns:0 frame:0
              TX packets:60846 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:100
              RX bytes:826016878 (787.7 Mb)  TX bytes:5745933 (5.4 Mb)
              Interrupt:19 Base address:0xb880 Memory:fe0fe000-fe0fe038
    
    eth1      Link encap:Ethernet  HWaddr 00:81:8A:10:5C:34
              inet addr:10.0.0.10  Bcast:10.0.0.255  Mask:255.255.255.0
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:100
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
              Interrupt:19 Base address:0xef00 Memory:febfd000-febfd038
    
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:162 errors:0 dropped:0 overruns:0 frame:0
              TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:11692 (11.4 Kb)  TX bytes:11692 (11.4 Kb)

    This example shows that two Ethernet interfaces, eth0 and eth1, are present and running (as indicated by UP in the third line of each interface description).

    If the second network does not appear, it may be that a network interface card must be installed in order to provide a second network, or it may be that the network is not yet initialized.

  4. (Optional) Make the modifications required to use CXFS connectivity diagnostics.

Verifying the Private and Public Networks for IRIX

For each private network on each node in the pool, verify access with the ping command. Enter the following, where nodeIPaddress is the IP address of the node:

ping nodeIPaddress

For example:

irix# ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 128.162.240.141 : 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.310 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.122 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.127 ms

Also execute a ping on the public networks. If ping fails, follow these steps:

  1. Verify that the network interface was configured up using ifconfig. For example:

    irix# ifconfig eth1
    eth1      Link encap:Ethernet  HWaddr 00:81:8A:10:5C:34
              inet addr:10.0.0.10  Bcast:10.0.0.255  Mask:255.255.255.0
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:100
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
              Interrupt:19 Base address:0xef00 Memory:febfd000-febfd038

    In the third output line above, UP indicates that the interface was configured up.

  2. Verify that the cables are correctly seated.

Repeat this procedure on each node.

Client Software Installation for IRIX


Note: CXFS does not support a miniroot installation.

You cannot combine the IRIX operating system installation and the CXFS installation. You must install the operating system first.

To install the required IRIX software, do the following on each IRIX client-only node:

  1. Read the SGI InfiniteStorage Software Platform release notes, and CXFS general release notes in the /docs directory on the ISSP DVD and any late-breaking caveats on Supportfolio.

  2. Verify that you are running the correct version of IRIX or upgrade to IRIX 6.5.x according to the IRIX 6.5 Installation Instructions.

    To verify that a given node has been upgraded, use the following command to display the currently installed system:

    irix# uname -aR

  3. (For sites with a serial port server) Install the version of the serial port server driver that is appropriate to the operating system. Use the CD that accompanies the serial port server. Reboot the system after installation.

    For more information, see the documentation provided with the serial port server.

  4. Insert IRIX CD-ROM #1 into the CD drive.

  5. Start up inst and instruct it to read the CD:

    # inst
    ...
    Inst> open /CDROM/dist


    Caution: Do not install to an alternate root using the inst -r option. Some of the exit operations (exitops) do not use pathnames relative to the alternate root, which can result in problems on both the main and alternate root filesystem if you use the -r option. For more information, see the inst man page.


  6. (Optional) If you want to use Performance Co-Pilot to run XVM statistics, install the default pcp_eoe subsystems. This installs the Performance Co-Pilot PMDA (the agent to export XVM statistics) as an exit operation (exitop).

    Inst> keep *
    Inst> install pcp_eoe default
    Inst> go
    ...
    Inst> quit

  7. Transfer the client software that was downloaded onto a server-capable administration node during its installation procedure using ftp, rcp, or scp .

    The location of the tardist on the server will be as follows:

    /usr/cluster/client-dist/CXFS_VERSION/irix/IRIX_VERSION/noarch/cxfs-client.tardist
    

    For example, for IRIX 6.5.30:

    /usr/cluster/client-dist/5.0.0.3/irix/6530/noarch/cxfs-client.tardist

  8. Read the IRIX release notes by choosing the following from the desktop Toolchest to bring up the Software Manager window:

    System -> Software Manager

    Choose Customize Installation by typing the directory into which you downloaded the software into the Available Software box. A list of products available for installation will come up. If the product name is highlighted (similar to an HTML link), then there are release notes available. Click on the link to bring up the Release Notes window.

  9. Change to the directory containing the tardist and install the software. For example:

  10. Install the CXFS software:

    irix# cd download_directory
    irix# inst -f .
    Inst> install *


    Caution: Do not install to an alternate root using the inst -r option. Some of the exit operations (exitops) do not use pathnames relative to the alternate root, which can result in problems on both the main and alternate root filesystem if you use the -r option. For more information, see the inst man page.

    If you do not install cxfs_client, the inst utility will not detect a conflict, but the CXFS cluster will not work. You must install the cxfs_client subsystem.

  11. (Optional) If you do not want to install GRIO:

    Inst> keep *.*.grio2*

  12. Install the chosen software:

    Inst> go 
    ...
    Inst> quit

    This installs the following packages:

    cxfs.books.CXFS_AG
    cxfs.man.relnotes
    cxfs.sw.cxfs
    cxfs.sw.grio2_cell   (Optional)
    cxfs.sw.xvm_cell
    cxfs_client.man.man
    cxfs_client.sw.base
    cxfs_util.man.man
    cxfs_util.sw.base
    eoe.sw.grio2         (Optional)
    eoe.sw.xvm
    patch_cxfs.eoe_sw.base
    patch_cxfs.eoe_sw64.lib

    The process may take a few minutes to complete.

  13. Reboot the system.

I/O Fencing for IRIX Nodes

On the IRIX platform, the cxfs_client software automatically detects the world wide port names (WWPNs) of any supported host bus adapters (HBAs) in the system that are connected to a switch that is configured in the cluster database. These HBAs will then be available for fencing. However, if no WWPNs are detected, there will be messages logged to the /var/log/cxfs_client file.

To configure fencing, see the CXFS Administration Guide for SGI InfiniteStorage.

Start/Stop cxfs_client for IRIX

The /etc/init.d/cxfs_client script will be invoked automatically during normal system startup and shutdown procedures. This script starts and stops the cxfs_client daemon.

To start cxfs_client manually, enter the following:

irix# /etc/init.d/cxfs_client start

To stop cxfs_client manually, enter the following:

irix# /etc/init.d/cxfs_client stop

Automatic Restart for IRIX

If you want nodes to restart automatically when they are reset or when the node is powered on, you must set the boot parameter AutoLoad variable on each IRIX node to yes as follows:

# nvram AutoLoad yes

This setting is recommended, but is not required for CXFS.

You can check the setting of this variable with the following command:

# nvram AutoLoad

CXFS chkconfig Arguments for IRIX

The cxfs_client argument to chkconfig controls whether or not the cxfs_client daemon should be started.

Modifying the CXFS Software for IRIX

You can modify the CXFS client daemon (/usr/cluster/bin/cxfs_client ) by placing options in the cxfs_client.options file:

/etc/config/cxfs_client.options

The available options are documented in the cxfs_client man page.


Caution: Some of the options are intended to be used internally by SGI only for testing purposes and do not represent supported configurations. Consult your SGI service representative before making any changes.

For example, to see if cxfs_client is using the options in cxfs_client.options:

irix# ps -ef | grep cxfs_client
    root     219311     217552  0 12:03:17 pts/0   0:00 grep cxfs_client
    root        540          1  0   Feb 26 ?      77:04 /usr/cluster/bin/cxfs_client -i cxfs3-5

GRIO on IRIX

CXFS supports guaranteed-rate I/O (GRIO) version 2 on the IRIX platform. Application bandwidth reservations must be explicitly released by the application before exit. If the application terminates unexpectedly or is killed, its bandwidth reservations are not automatically released and will cause a bandwidth leak. If this happens, the lost bandwidth could be recovered by rebooting the client node.

For more information, see “Guaranteed-Rate I/O (GRIO) and CXFS” in Chapter 1 and the Guaranteed-Rate I/O Version 2 Guide.

XVM Failover V2 on IRIX

Following is an example of the /etc/failover2.conf file on an IRIX system:

/dev/dsk/20000080e5116e2a/lun0vol/c2p1 affinity=0 preferred
/dev/dsk/20000080e511ab60/lun0vol/c2p3 affinity=1

/dev/dsk/20000080e5116e2a/lun1vol/c2p1 affinity=0
/dev/dsk/20000080e511ab60/lun1vol/c2p3 affinity=1 preferred

/dev/dsk/200400a0b80f7ecf/lun0vol/c2p1 affinity=0 preferred
/dev/dsk/200500a0b80f7ecf/lun0vol/c2p1 affinity=1

/dev/dsk/200400a0b80f7ecf/lun1vol/c2p1 affinity=0
/dev/dsk/200500a0b80f7ecf/lun1vol/c2p1 affinity=1 preferred

/dev/dsk/200400a0b80f7ecf/lun2vol/c2p1 affinity=0 preferred
/dev/dsk/200500a0b80f7ecf/lun2vol/c2p1 affinity=1

/dev/dsk/200400a0b80f7ecf/lun3vol/c2p1 affinity=0
/dev/dsk/200500a0b80f7ecf/lun3vol/c2p1 affinity=1 preferred

For more information, see:

Troubleshooting on IRIX

This section discusses the following:

Identify the Cluster Status

When you encounter a problem, identify the cluster status by answering the following questions:

  • Are the cluster daemons running?

  • Is the cluster state consistent on each node? Run the clconf_info command on each server-capable administration node and compare.

  • Which nodes are in the CXFS kernel membership? Check the cluster status and the /var/adm/SYSLOG file.

  • Which nodes are in the cluster database (fs2d ) membership? See the /var/cluster/ha/log/fs2d_log files on each server-capable administration node.

  • Is the database consistent on all server-capable administration nodes? Determine this logging in to each server-capable administration node and examining the /var/cluster/ha/log/fs2d_log file and database checksum.

  • Log onto the various CXFS client nodes or use the GUI view area display with details showing to answer the following:

    • Are the devices available on all nodes? Use the following:

      • The xvm command to show the physical volumes:

        xvm:cluster> show -v phys/

      • Is the client-only node in the cluster? Use the cxfs_info command.

      • List the contents of the /dev/cxvm directory with the ls command:

        # ls /dev/cxvm

      • Use the hinv command to display the hardware inventory.

    • Are the filesystems mounted on all nodes? Use mount and clconf_info commands.

    • Which node is the metadata server for each filesystem? Use the clconf_info command.

    On the metadata server, use the clconf_info command.

  • Is the metadata server in the process of recovery? Look at the following file: /var/log/messages

    Messages such as the following indicate that recovery status:

    • In process:

      Mar 13 11:31:02 1A:p2 unix: ALERT: CXFS Recovery: Cell 1: Client Cell 0 Died, Recovering </scratch/p9/local>

    • Completed:

      Mar 13 11:31:04 5A:p2 unix: NOTICE: Signaling end of recovery cell 1

  • If filesystems are not mounting, do they appear online in XVM? You can use the following xvm command:

    xvm:cluster> show vol/*

Physical Storage Tools

Understand the following physical storage tools:

  • To display the hardware inventory:

    irix# /sbin/hinv

    If the output is not what you expected, do a probe for devices and perform a SCSI bus reset, using the following command:

    irix# /usr/sbin/scsiha -pr bus_number

  • To configure I/O devices on an IRIX node, use the following command:

    irix# /sbin/ioconfig -f /hw

  • To show the physical volumes, use the xvm command:

    irix# /sbin/xvm show -v phys/

    See the XVM Volume Manager Administrator's Guide.

Disk Activity

Use the sar system activity reporter to show the disks that are active. For example, the following example for IRIX will show the disks that are active, put the disk name at the end of the line, and poll every second for 10 seconds:

irix# sar -DF 1 10

For more information, see the sar(1) man page.

Buffers in Use

Use the IRIX bufview filesystem buffer cache activity monitor to view the buffers that are in use. Within bufview, you can use the help subcommand to learn about available subcommands, such as the f subcommand to limit the display to only those with the specified flag. For example, to display the in-use (busy) buffers:

# bufview
f
Buffer flags to display bsy 

For more information, see the bufview(1) man page.

Performance Monitoring Tools

Understand the following performance monitoring tools:

  • To monitor system activity:

    /usr/bin/sar

  • To monitor filesystem buffer cache activity :

    /usr/sbin/bufview


    Note: Do not use bufview interactively on a busy IRIX node; run it in batch mode.


  • To monitor operating system activity data on an IRIX node:

    /usr/sbin/osview

  • To monitor the statistics for an XVM volume, use the xvm command:

    /sbin/xvm change stat on {concatname|stripename|physname}

    See the XVM Volume Manager Administrator's Guide.

  • To monitor system performance, use Performance Co-Pilot (PCP). See the PCP documentation and the pmie (1) and pmieconf(1) man pages.

  • To monitor CXFS heartbeat timeouts, use the icrash command. For example, the following command prints the CXFS kernel messaging statistics:

    irix# icrash -e "load -F cxfs; mtcp_stats"
    corefile = /dev/mem, namelist = /unix, outfile = stdout
    
    Please wait............
    Loading default Sial macros...........
    
    
    >> load cxfs
    
    >> mtcp_stats
    STATS @ 0xc000000001beebb8
    Max delays: discovery 500767 multicast 7486 hb monitor 0
    hb generation histogram:(0:0)(1:0)(2:0)(3:0)(4:0)(5:0)
    Improperly sized alive mesgs 0 small 0 big 0
    Alive mesgs with: invalid cell 0 invalid cluster 0 wrong ipaddr 2
    Alive mesgs from: unconfigured cells 100 cells that haven't discovered us 6000
    mtcp_config_cell_set 0x0000000000000007
    cell 0:starting sequence # 77 skipped 0
    hb stats init @ 15919:(0:1)(1:478301)(2:29733)(3:0)(4:0)
    cell 1:starting sequence # 0 skipped 0
    hb stats init @ 360049:(0:1)(1:483337)(2:21340)(3:0)(4:0)
    cell 2:starting sequence # 0 skipped 0

    The following fields contain information that is helpful to analyzing CXFS heartbeat timing:

    • discovery : The maximum time in HZ that the discovery thread (that is, the thread that processes incoming heartbeats) has slept. Because nodes generate heartbeats once per second, this thread should never sleep substantially longer than 100 HZ.

      A value much larger than 100 suggests either that it was not receiving heartbeats or that something on the node prevented this thread from processing the heartbeats.

    • multicast: The thread that generates heartbeats sleeps for 100 HZ after sending the last heartbeat and before starting on the next. This field contains the maximum time in HZ between the start and end of that sleep. A value substantially larger than 100 indicates a problem getting the thread scheduled; for example, when something else on the node is taking all CPU resources.

    • monitor: The maximum time in HZ for the heartbeat thread tp sleep and send its heartbeat. That is, it contains the value for multicast plus the time it takes to send the heartbeat. If this value is substantially higher than 100 but multicast is not, it suggests a problem in acquiring resources to send a heartbeat, such as a memory shortage.

    • gen_hist: A histogram showing the number of heartbeats generated within each interval. There are 6 buckets tracking each of the first 5 seconds (anything over 5 seconds goes into the 6th bucket).

    • hb_stats: Histograms for heartbeats received. There is one histogram for each node in the cluster.

    • seq_stats: Number of consecutive incoming heartbeats that do not have consecutive sequence numbers. There is one field for each node. A nonzero value indicates a lost heartbeat message.

    • overdue: Time when an overdue heartbeat is noticed. There is one field per node.

    • rescues: Number of heartbeats from a node that are overdue but CXFS message traffic has been received within the timeout period.

    • alive_small: Number of times a heartbeat message arrived that was too small, (that is, contained too few bytes).

    • alive_big: Number of times a heartbeat arrived that was too large.

    • invalid_cell: Number of heartbeats received from nodes that are not defined in the cluster.

    • invalid_cluster: Number of heartbeats received with the wrong cluster ID.

    • wrong_ipaddr: Number of heartbeats received with an IP address that does not match the IP address configured for the node ID.

    • not_configured: Number of heartbeats received from nodes that are not defined in the cluster.

    • unknown: Number of heartbeats from nodes that have not received the local node's heartbeat.

Kernel Status Tools


Note: You must run the sial scripts version of icrash commands.

Understand the following kernel status tools (this may require help from SGI service personnel):

  • To determine IRIX kernel status, use the icrash command:

    # /usr/bin/icrash
    >> load -F cxfs
    


    Note: Add the -v option to these commands for more verbose output.


    • cfs to list CXFS commands

    • dcvn to obtain information on a single client vnode

    • dcvnlist to obtain a list of active client vnodes

    • dsvn to obtain information on a single server vnode

    • dsvnlist to obtain a list of active server vnodes

    • mesglist to trace messages to the receiver (you can pass the displayed object address to the dsvn command to get more information about the server vnodes and pass the thread address to the mesgargs command to get more information about the stuck message). For example (line breaks shown here for readability):

      >> mesglist
       Cell:2
      TASK ADDR           MSG ID TYPE CELL MESSAGE                          Time(Secs) Object
      ================== ======= ==== ==== ================================ ========== ===========================
      0xe0000030e5ba8000      14  Snt    0                     I_dsvn_fcntl          0 N/A
      0xe0000030e5ba8000      14  Cbk    0                   I_ucopy_copyin          0 N/A
      0xa80000000bb77400    1210  Rcv    0               I_dsxvn_allocate_1       1:06 (dsvn_t*)0xa80000000a7f8900
      
      
      >> mesgargs 0xa80000000bb77400
      (dsvn_t*)0xa80000000a7f8900
          (dsxvn_allocate_1_in_t*)0xa800000001245060
              objid=0xa80000000a7f8910 (dsvn=0xa80000000a7f8900)
              offset=116655
              length=0x1
              total=1
              mode=2
              bmapi_flags=0x7
              wr_ext_count=0
              &state=0xa8000000012450b0 credid=NULLID
              lent_tokens=0xa800000 (DVN_TIMES_NUM(SWR)|DVN_SIZE_NUM(WR)|DVN_EXTENT_NUM(RD))
              reason_lent=0x24800000 (DVN_TIMES_NUM(CLIENT_INITIATED)|DVN_SIZE_NUM(CLIENT_INITIATED)|
      DVN_EXTENT_NUM(CLIENT_INITIATED))
              lender_cell_id=0
          (dsxvn_allocate_1_inout_t*)0xa800000001245110
              cxfs_flags=0x200
              cxfs_gen=4661
      
      
      
      >> dsvn 0xa80000000a7f8900
      (dsvn_t*)0xa80000000a7f8900:
          flags 0x10
          kq.next 0xc000000001764508 kq.prev 0xc000000001764508
          &tsclient 0xa80000000a7f8a30  &tserver 0xa80000000a7f8a80
          bhv 0xa80000000a7f8910 dsvfs 0xa800000026342b80
          (cfs_frlock_info_t*)0xa80000000bfee280:
              wait: none
              held: none
          vp 0xa8000000224de500 v_count 2 vrgen_flags 0x0
          dmvn 0x0000000000000000
          objid 0xa80000000a7f8910 gen 4 obj_state 0xa80000000a7f8940
          (dsxvn_t*)0xa80000000a7f8900:
              dsvn 0xa80000000a7f8900 bdp 0xa800000010b52d30
              tkclient 0xa80000000a7f8a30 tserver 0xa80000000a7f8a80
              ext gen 4661 io_users 2 exclusive_io_cell -1
              oplock 0 oplock_client -1 &dsx_oplock_lock 0xa80000000a7f8b9

    • sinfo to show clients/servers and filesystems

    • sthread | grep cmsd to determine the CXFS kernel membership state. You may see the following in the output:

      • cms_dead() indicates that the node is dead

      • cms_follower() indicates that the node is waiting for another node to create the CXFS kernel membership (the leader)

      • cms_leader() indicates that the node is leading the CXFS kernel membership creation

      • cms_declare_membership() indicates that the node is ready to declare the CXFS kernel membership but is waiting on resets

      • cms_nascent() indicates that the node has not joined the cluster since starting

      • cms_shutdown() indicates that the node is shutting down and is not in the CXFS kernel membership

      • cms_stable() indicates that the CXFS kernel membership is formed and stable

    • tcp_channels to determine the status of the connection with other nodes

    • t -a -w filename to trace for CXFS

    • t cms_thread to trace one of the above threads

  • To invoke internal kernel routines that provide useful debugging information, use the idbg command:

    # /usr/sbin/idbg

Are there any long running (>20 seconds) kernel messages? Use the icrash mesglist command to examine the situation.

No Cluster Name ID Error

For example:

Mar  1 15:06:18 5A:nt-test-07 unix: NOTICE: Physvol (name cip4) has no 
CLUSTER name id: set to ""

This message means the following:

  • The disk labeled as an XVM physvol was probably labeled under IRIX 6.5.6f and the system was subsequently upgraded to a newer version that uses a new version of XVM label format. This does not indicate a problem.

  • The cluster name had not yet been set when XVM encountered these disks with an XVM cluster physvol label on them. This is normal output when XVM performs the initial scan of the disk inventory, before node/cluster initialization has completed on this host.

    The message indicates that XVM sees a disk with an XVM cluster physvol label, but that this node has not yet joined a CXFS membership; therefore, the cluster name is empty ("").

    When a node or cluster initializes, XVM rescans the disk inventory, searching for XVM cluster physvol labels. At that point, the cluster name should be set for this host. An empty cluster name after node/cluster initialization indicates a problem with cluster initialization.

    The first time any configuration change is made to any XVM element on this disk, the label will be updated and converted to the new label format, and these notices will go away.

    For more information about XVM, see the XVM Volume Manager Administrator's Guide.

System is Hung

The following may cause the system to hang:

  • Overrun disk drives.

  • CXFS heartbeat was lost. In this case, you will see a message that mentions withdrawl of node.

  • As a last resort, do a non-maskable interrupt (NMI) of the system and contact SGI. (The NMI tells the kernel to panic the node so that an image of memory is saved and can be analyzed later.) For more information, see the owner's guide for the node.

    Make the following files available:

    • System log file: /var/adm/SYSLOG

    • IRIX vmcore.#.comp

    • IRIX unix.#

SYSLOG credid Warnings

Messages such as the following in the SYSLOG indicate that groups from another node are being dropped, and you may not be able to access things as expected, based on group permissions (line breaks added here for readability):

May  1 18:34:42 4A:nodeA unix: WARNING: credid_bundle_import: received cred for uid 5778 with 23 groups when \
     configured for only 16 groups. Extra groups dropped.
May  1 18:34:59 4A:nodeB unix: WARNING: credid_getcred: received cred for uid 5778 with 23 groups when \
     configured for only 16 groups. Extra groups dropped.
May  1 18:35:44 4A:nodeB unix: WARNING: credid_getcred: received cred for uid 5778 with 23 groups when  \
     configured for only 16 groups. Extra groups dropped.
May  1 18:36:29 4A:nodeA unix: WARNING: credid_bundle_import: received cred for uid 5778 with 23 groups  \
     when configured for only 16 groups. Extra groups dropped.
May  1 18:38:32 4A:nodeA unix: WARNING: credid_bundle_import: received cred for uid 5778 with 23 groups  \
     when configured for only 16 groups. Extra groups dropped.
May  1 18:38:50 4A:nodeB unix: WARNING: credid_getcred: received cred for uid 5778 with 23 groups when  \
     configured for only 16 groups. Extra groups dropped.
May  1 18:39:32 4A:nodeB unix: WARNING: credid_getcred: received cred for uid 5778 with 23 groups when  \
     configured for only 16 groups. Extra groups dropped.
May  1 18:40:13 4A:nodeB unix: WARNING: credid_getcred: received cred for uid 5778 with 23 groups when  \
     configured for only 16 groups. Extra groups dropped.
May  1 18:40:35 4A:nodeA unix: WARNING: credid_bundle_import: received cred for uid 5778 with 23 groups  \
     when configured for only 16 groups. Extra groups dropped.
May  1 19:04:52 4A:nodeA unix: WARNING: credid_bundle_import: received cred for uid 6595 with 21 groups  \
     when configured for only 16 groups. Extra groups dropped.
May  1 19:38:58 4A:nodeA unix: WARNING: credid_bundle_import: received cred for uid 6595 with 21 groups  \
     when configured for only 16 groups. Extra groups dropped.

The IRIX ngroups_max static system tunable parameter specifies the maximum number of multiple groups to which a user may simultaneously belong. You should increase the number of groups by running the following command and then rebooting:

irix# systune ngroups_max value

Reporting IRIX Problems

When reporting a problem about an IRIX node to SGI, you should retain the following information:

  • If a panic has occurred on an IRIX node, retain the system core files in /var/adm/crash, including the following:

    analysis.number
    unix.number
    vmcore.number.comp

  • For any type of problem:

    • Run the /usr/cluster/bin/cxfsdump utility on an IRIX node and retain the output. You can run this utility immediately after noticing a problem. The cxfsdump utility attempts to collect information from all nodes in the cluster by using the rsh command.

    • Determine the Fibre Channel HBA World Wide name mapping:

      scsiha -w bus#

    • Gather output from the following commands:

      /usr/bin/hinv
      /usr/sbin/topology