This chapter covers the following topics:
“Verify the Bonding Mode on the Rack Leader Controller (RLC)”
“cimage --push-rack Pushes Too Many (or Too Few) Expansions”
“Cannot ping the CMCs from the Rack Leader Controller (RLC)”
“Troubleshooting a Rack Leader Controller (RLC) With Misconfigured Switch Information”
“System Admin Controller (SAC) eth2 Link in the Bond is Down”
This chapter provides answers to some common problems users encounter when installing or upgrading an SGI ICE X system. It includes diagnosis and troubleshooting information.
The switchconfig command displays switch settings and enables you to configure switches. The command's format is as follows:
switchconfig subcommand [--help | -h] [--debug] [--log file | -l file] [--switch host[,host][, ...]] [--switches system_ID] [--vlan number] | -v number] [--ip address | -i address] [--netmask mask | -n mask] [-c old_password] [-n new_password] |
For subcommand, specify one of the following:
Table 5-1. switchconfig Subcommands
Group | Subcommand | Purpose |
|---|---|---|
Informational | list | Displays current settings. |
Configuration | set | Assigns virtual local area networks (VLANs) and other settings to one or more MAC addresses. |
| unset | Returns ports to default settings. |
IP Management | show_ip | Returns the IP address that is configured for the switch. |
| set_ip | Adds an IP address for the VLAN on the switch. Used to route traffic from MIC devices. |
| unset_ip | Removes the IP address for the VLAN on the switch. |
OSPF Management | show_ospf | Returns the open shortest path first (OSPF) router protocol used to route the MIC traffic between the switches. |
| set_ospf | Sets the OSPF router protocol used to route traffic between the switches. |
| unset_ospf | Disables OSPF and all the network statements that are associated with OSPF. |
MTU Management | show_mtu | Displays the assigned maximum transportation unit (MTU) value for all ports in the switch stack. |
| set_mtu | Assigns all ports in the switch stacks to the MTU value. |
| unset_mtu | Removes the MTU value from all ports in the switch stack. |
Gateway Management | show_default_gateway | Displays the default gateway assigned to the switch network. |
| set_default_gateway | Sets the default gateway for the switch network. |
| unset_default_gateway | Clears the default gateway that is assigned to the switch network. |
Propagate Configuration | pull_switch_config | Copies the start-up switch's configuration file from the TFTP server to the switch(es) you specify on the command and loads the switch. |
| push_switch_config | Copies the switch configuration file from the switch you specify and saves it to the TFTP server. |
Miscellaneous | save_running_config | Saves the current switch configuration to the switch's nonvolatile memory. |
| reset_factory_defaults | Reverts to the default, factory configuration and reboots. |
| sanity_check | Runs a sanity check on the switch configuration. For example, it checks the switch ports for trunking. It returns configuration information regarding misconfigured elements and points out anomalies. |
| change_password | Changes the administrator password on the switch. By default, the administrator login is admin, and the password is admin. This subcommand is the only subcommand that uses the -c old_password and -n new_password parameters |
The rest of the switchconfig command arguments are as follows:
| Argument | Specification |
| file | The full path to the file to which switchconfig can write log output. |
| host | The hostname or IP address of the management switch. |
| system_ID | The system ID of the switch stack you want to reference. Specify the switch's IP address or hostname for system_ID . If you have more than one switch, specify a comma-separated list of system_ID values. For example, mgmtsw0 is the hostname and system_ID for the first management switch. | |
| number | The VLAN number. | |
| address | The IP address needed for the command. This is typically the IP address of a switch that you want the command to operate on, the IP address you want to assign, and so on. | |
| mask | The netmask you want to use. | |
| old_password | The old password that you want to change. By default, this password is the same as the SAC system password. | |
| new_password | The new password that you want to use on the switches. |
Example 1. The following switchconfig command returns help text for the entire switchconfig command:
# switchconfig --help |
Example 2. The following switchconfig command returns help text for only the set_ip parameter:
# switchconfig set_ip --help |
The preceding command shows how to display help output for set_ip, which is only one of the switchconfig subcommands. You can display output for any of the other switchconfig subcommands in the same way.
Example 3. The following switchconfig command shows how to set the IP address to 100.100.100.100 on for VLAN 101 on switch mgmtsw0:
# switchconfig set_ip --switch mgmtsw0 --vlan 101 --ip 100.100.100.100 |
Example 4. The following switchconfig command adds an IP address for the VLAN on the switches that route traffic for the MIC devices on the compute nodes:
# switchconfig set_ip --switch mgmtsw0 --vlan 101 --ip 10.159.2.54 --netmask 255.255.255.0 |
The following command is equivalent:
# switchconfig set_ip --s mgmtsw0 --v 101 --i 10.159.2.54 --n 255.255.255.0 |
Example 5. The following two switchconfig commands return the switch IP addresses used for routing.
Command one:
# switchconfig show_ip -s mgmtsw0 -v 1 VLAN 1 is Administrative Up - Link Up Address is B4-0E-DC-39-C4-83 Index: 1001, MTU: 1500 Address Mode is DHCP IP Address: 172.23.0.254 Mask: 255.255.0.0 Proxy ARP is disabled |
Command two:
# switchconfig show_ip -s mgmtsw0 -v 101 VLAN 101 is Administrative Up - Link Up Address is B4-0E-DC-39-C4-83 Index: 1101, MTU: 1500 Address Mode is User specified IP Address: 10.159.1.254 Mask: 255.255.255.0 IP Address: 10.157.1.254 Mask: 255.255.255.0 Secondary IP Address: 10.158.1.254 Mask: 255.255.255.0 Secondary IP Address: 10.160.1.254 Mask: 255.255.255.0 Secondary Proxy ARP is disabled |
Example 6. All of the switches for an SGI ICE X system must have the same password. The following command changes the switch password for a system with two switches:
# switchconfig change_password -c admin -n mynewpassword --switches mgmtsw0,mgmtsw1 |
The following procedure explains a way to troubleshoot a broken DCM enablement.
Procedure 5-1. To troubleshoot a DCM enablement
On the SAC, type the lk_verify command to ensure that the SGI ICE X license and the SMC Power Opt license is installed correctly.
Open file /opt/sgi/sgimc/etc/dcm.properties and add the following line to enable debugging:
dcm.debug.log = 1 |
Open file /opt/sgi/sgimc/etc/system-clustermanager.profile and add the following line to enable debugging:
system.logging:com.lnxi.debug logging.level:DEBUG logging.subkeys:DCM |
Type the following command to restart the SMC daemon:
# service mgr restart |
Use a text editor to open file /opt/sgi/smc/log/SGIMC-server.log and examine the file contents.
If the service started successfully, it contains the following entries:
Starting communication plugin com.sgi.clusterman.comms.dcm.client.DcmClient... Starting service plugin com.sgi.clusterman.server.dcm.DcmStateService... [...] DCM: Creating SEI proxy with timeout: 30000 com.sgi.clusterman.server.dcm.DcmStateService registered as ServiceProvider DcmStateService DCM: Event Listener setup complete. |
If compute nodes are taking a long time to boot, perform the following:
See “Verify the Bonding Mode on the Rack Leader Controller (RLC)” to verify that the compute nodes have the proper bonding setup.
Verify the rack leader controller (RLC) has a MegaRAID controller. 144 nodes will not boot well with 106x controllers, for example. You can verify this with lspci command.
To verify the MegaRAID battery is working and charged, perform the following:
# /opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -a0 |
You should see 'Status : Healthy' under 'BBU' (BBU = Battery Backup Unit).
| Note: If this is the first time the node has booted up, it takes several hours for the BBU to be charged. |
Verify cache is set to write-back, as follows:
# /opt/MegaRAID/MegaCli/MegaCli64 -LDGetProp -Cache -LALL -a0 |
| Note: Never force write-back on if bad BBU (-CachedBadBBU) as data loss happens with an orderly shutdown that includes a power off. |
When you see the output: Cache Policy:WriteBack, write-back is enabled.
To enable the write-back policy, perform the following:
# /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp -NoCachedBadBBU -LALL -a0 |
The redundant management network (RMN) is the default configuration for SGI ICE X systems. To verify the bonding mode, perform the following from the RLC:
r1lead:~ # cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 17
Partner Key: 4
Partner Mac Address: b4:0e:dc:37:4f:a7
Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:25:90:38:e5:22
Aggregator ID: 1
Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:25:90:38:e5:23
Aggregator ID: 1
|
If you see Bonding Mode: IEEE 802.3ad Dynamic link aggregation , RMN is on.
If you see Bonding Mode: fault-tolerance (active-backup) , it means that the bonding mode and potentially redundant management networking is disabled.
Use the configure-cluster GUI Configure Redundant Management Network option to turn on the redundant management network (RMN) system for nodes being discovered going forward.
Set the redundant management networking mode on, as follows:
# cadmin --enable-redundant-mgmt-network --node r1lead |
Set the bonding mode per node, as follows:
# cadmin --set-mgmt-bonding --node r1lead 802.3ad |
You need to reboot the system.
The /proc/net/bonding/bond0 file, should show the bonding mode with link aggregation configured, as follows:
Bonding Mode: IEEE 802.3ad Dynamic link aggregation |
The number of ports should be the following:
Number of ports: 2 |
2 is the correct value for an RMN configuration. If the number is 1, it mean the trunk has not formed. The likely causes for this are, as follows:
The Ethernet cable is not connected to top level switch. From the RLC, use the /sbin/ethtool on eth0 and eth1 to verify the link is present, as follows:
r1lead:~ # /sbin/ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: umbg
Wake-on: g
Current message level: 0x00000003 (3)
Link detected: yes |
The Ethernet cable is connected, but linking is wrong. When the /sbin/ethtool command output shows the link speed as 100 Mb/s due to a bad cable the trunk leg is rejected.
The top level Ethernet switch misconfigured: Perhaps the switchconfig tool did not get this port configured properly. You can either log in to the switch to try to diagnose, or try the following procedure:
Find the MAC address of the r1lead bond interface, as follows:
r1lead:~ # ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:25:90:38:E5:22
inet addr:172.23.0.7 Bcast:172.23.255.255 Mask:255.255.0.0
inet6 addr: fe80::225:90ff:fe38:e522/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:286749167 errors:0 dropped:0 overruns:0 frame:0
TX packets:328574062 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:38868281915 (37067.6 Mb) TX bytes:153036792319 (145947.2 Mb) |
From the system admin controller (SAC), run the switchconfig list --switches mgmtsw0 command to list the MAC addresses trunks from the switches, as follows:
sys-admin:~ # switchconfig list --switches mgmtsw0
Current MAC/port configuration:
Switch Identifier: mgmtsw0 IP Address: 172.23.0.6
MAC Port Trunk default-VLAN allowed-VLANs
-------------------------------------------------------------
00-25-90-3F-16-C4 1/6 1 1(u)
00-30-48-F7-84-65 1/48 1 1(u)
00-25-90-38-E5-22 1/5 1 1 1(u), 101(t)
00-25-90-38-E5-23 1/5 1 1 1(u), 101(t)
00-25-90-38-E5-22 1/5 1 101 1(u), 101(t)
00-25-90-38-85-BC 1/7 2 1 1(u)
00-25-90-38-85-BD 1/7 2 1 1(u)
... |
If the RLC r1lead bond interface MAC address shows up in the Port column and not the Trunk column, the switch is not configured correctly.
To properly configure the switch, from the SAC node, perform a command similar to the following:
# switchconfig set -s mgmtsw0 -v num=1 -v num=101,tag=tagged -b lacp -d 1 -m 00:25:90:38:E5:30 |
This replaces 101 with the proper VLAN number. 101 for rack group 1, 102 for rack group 2, and so on.
ssh onto the r1lead and verify that the RLC shows Number of ports: 2.
When you perform cimage --push-rack (or when blademond calls discovery-rack), it creates read/write expansions for each compute node.
Use the configure-cluster GUI Configure Default Max Rack IRU Setting option to set the default number of individual rack units (IRUs), supported by a rack leader controller (RLC). Set this value to the number of CMCs that will be served by each RLC. The default is 8. When you change it, it only impacts node discoveries in the future.
You can change the setting per-node with the cadmin command, as follows:
sys-admin:~ # cadmin --set-max-rack-irus --node r1lead 8 |
If this is an RLC with a brand new, never-before-discovered top level switch (or set of switches), the cmcdetectd daemon will see CMCs asking for IP addresses on the HEAD network. It configures the top level switch(es) so that the CMCs are on the appropriate rack VLAN. Make sure cmcdetectd is running, restart if needed.
You can diagnose this some by running the tcpdump command looking for DHCP requests. The requests should be seen on the RLC and not the system admin controller (SAC). For example, type the following command from r1lead:
# /usr/sbin/tcpdump -i bond0 -s600 -nn -vv -e -t -l -p broadcast and src port 68 and dst port 67
tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 600 bytes
00:25:90:3f:16:c4 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 590: (tos 0x0, ttl 64, id 0, offset 0, \
flags [none], proto UDP (17), length 576) 0.0.0.0.68 > 255.255.255.255.67:
[udp sum ok] BOOTP/DHCP, Request from 00:25:90:3f:16:c4, length 548, xid 0x8b8d332a, Flags [none] (0x0000)
Client-IP 172.24.0.2
Client-Ethernet-Address 00:25:90:3f:16:c4
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Request
... |
If the switch was previously discovered but you are reinstalling the system or discovering a new root slot, cmcdetectd will not detect any CMC DHCP requests on HEAD. In this case, you need to be sure to run configure-cluster and set Configure Switch Management Network to yes. Note that changing configure-cluster only takes effect for nodes discovered in the future. If you have an existing RLC already discovered, you will need to run a command like the following:
# cadmin --enable-switch-mgmt-network --node r1lead |
After rebooting the RLC, make sure that the ifconfig command shows vlan101 as an interface and not vlan1 or vlan2 interfaces, as follows:
r1lead:~ # ifconfig
...
vlan101 Link encap:Ethernet HWaddr 00:25:90:38:E5:22
inet addr:192.168.160.1 Bcast:192.168.160.255 Mask:255.255.255.0
inet6 addr: fe80::225:90ff:fe38:e522/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:290550897 errors:0 dropped:0 overruns:0 frame:0
TX packets:268387414 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:30869741447 (29439.6 Mb) TX bytes:120262245830 (114691.0 Mb) |
Confirm dhcpd is running on the RLC. If dhcpd is not running, CMCs will not get their IP addresses. Check for errors starting dhcpd. If blademond failed to create the ice.conf dhcpd configuration file (/etc/dhcpd.conf.d) , see “Restarting the blademond Daemon”.
Verify proper CMC configuration. The CMC is configured for its rack number and slot number. If they are not configured correctly, multiple CMCs can be configured the same way resulting in problems. This can also result in the ice.conf dhcp configuration file being corrupted. You may need a USB serial cable to fix the CMCs if this is the case.
One troubleshooting approach is to run tcpdump on the RLC, as follows:
usr/sbin/tcpdump -i bond0 -s600 -nn -vv -e -t -l -p broadcast and src port 68 and dst port 67 |
Watch the DHCP requests over several minutes. If you see the same Client Identifier being requested by more than one MAC address, you are in a situation where the CMCs are not configured correctly.
Verify that the RLC is properly configured in the switch (see “Troubleshooting a Rack Leader Controller (RLC) With Misconfigured Switch Information”).
Confirm the wiring rules. See “Switch Wiring Rules”.
If you moved some CMCs from one RLC number to another and you already adjusted the rack and slot number in the CMC, The switch likely does not know about the changes. The CMCs are likely "lost" in the wrong VLAN, potentially a VLAN that is no longer in use. For example, if you had the CMCs served by the r3lead RLC but decided to decommission r3lead and move the CMCs to r1lead instead this situation could arise. In this case, the switch must be reconfigured. Use the switchconfig command to configure the ports connected to those CMCs for head. The system admin controller (SAC) cmcdetectd daemon will move them to the correct ultimate location.
You need to know the MACs of the CMC embeded Linux for this, so perhaps record this when you change the slot/rack number in the CMC. Hint: dbdump may still have the information depending on how you removed the RLC.
An example command is, as follows:
# switchconfig set -v num=1 -b manual -d 1 -m 08:00:69:16:51:49 --switches mgmtsw0 |
If you have more than one management switch, then list them in a comma-separated-list for --switches.
In a non-redundant-management configuration (switches not stacked), if the dhcpd daemon shows DHCP requests from the CMC but the CMC remains unpingable, it could be that both CMC-0 and CMC-1 are connected and linked. This breaks the wiring rules. When we are not wired for redundant management networking, only CMC-0 should be connected.
When not wired for redundant management networking (when switches are not stacked), do not connect CMC-1.
From the rack leader controller (RLC), perform the following steps:
Stop the daemon:
r1lead:~ # service blademond stop |
Remove etc/dhcpd.conf.d/ice.conf or /etc/dhcp/dhcpd.conf.d/ice.conf:
rm ice.conf dhcpd.conf |
Remove slot_map:
r1lead:~ # rm /var/opt/sgi/lib/blademond/slot_map |
Start the daemon:
r1lead:~ # service blademond start |
All of the log files reside in the /var/log directory. In addition to the messages log file and in some cases dhcpd file on the rack leader controller (RLC), here are some interesting /var/log directory log files:
/var/log/discover-rack
On the system admin controller (SAC), the discover-rack call is facilitated by blademond when new nodes are found. This log will often show problems with discovering nodes.
var/log/blademond
On the RLCs, this shows the blademond daemon actions. This includes showing when blade changes are found and it also shows its call to discover-rack, and so on. If there are CMC communication issues, they will often be noticed in this log.
/var/log/cmcdetectd.log
On the SAC, cmcdetectd logs its actions as it configures the switches for CMCs in the system. Watch for progress or errors here.
/var/log/switchconfig.log
On the SAC, there is a switchconfig command line tool. This tool is largely used by the discover command as nodes are discovered. Its actions are logged in to this log file. If RLC VLANs are not functioning properly, check the switchconfig log file.
This section describes what to do when the blademond daemon cannot find a system blade, as follows:
Can you ping the CMCs? See “Cannot ping the CMCs from the Rack Leader Controller (RLC)”.
If the CMCs are pingable, verify that they have a valid slot map. If the slot map returned by the CMC is missing entries, then blademond cannot function properly. It operates on information passed to it by the CMC. Some commands to run from the rack leader controller (RLC) are, as follows:
Dump the slot map from each CMC to your screen:
r1lead:~ # /opt/sgi/lib/dump-cmc-slot-tables |
Query an individual slot map:
r1lead:~ # echo STATUS | netcat r1i0c 4502 |
| Note: In some software distributions, netcat is nc. |
If the CMCs are pingable and the CMCs have valid slot maps, then you can focus on how blademond is functioning.
You can turn on debug mode in the blademond daemon by sending it a SIGUSR1 signal from the RLC, as follows:
# kill -USR1 pid |
The blademond daemon maintains the slot map at /var/opt/sgi/lib/blademond/slot_map on the RLCs. This appears as /var/opt/sgi/lib/blademond/slot_map.rack_number on the system admin controller (SAC).
For a blademond --help statement, ssh onto the r1lead RLC, as follows:
[root@admin ~]# ssh r1lead
Last login: Tue Jan 17 13:21:34 2012 from admin
[root@r1lead ~]#
[root@r1lead ~]# /opt/sgi/lib/blademond --help
Usage: blademond [OPTION] ...
Discover CMCs and blades managed by CMCs.
Note: This daemon normally takes no arguments.
--help Print this usage and exit.
--debug Enable debug mode (also can be enabled by setting CM_DEBUG)
--fakecmc Development only: Discover fake CMCs instead of real ones
--scan-once Initialize, scan for blades, set blades up. Do not daemonize.
Do not keep looping - do one pass and exit.
|
If there are ssh(1) key failures or if the compute node hosts seem to be BMCs, it is possible that there are problems with the CMC slot map might be corrupted.
The CMC maintains a cache file that records which MACs are BMC MACs and which are host MACs. It uses this information, combined with switch port location information in the embedded Broadcom switch, to generate the slot map used by the blademond daemon.
In certain situations, such as, a CMC reflash, may remove the cache file but leave CMC power active. In this situation, the CMC does not know which MACs on a given embedded switch port are host and which are BMC and gets the order randomly incorrect. It then caches the incorrect order. To fix this for each CMC, turn the power off with pfctl, zero out the MAC cache file, and reset each CMC. Then have blademond start over from scratch (see “Restarting the blademond Daemon”). Perform the following steps:
ssh as root to the rack leader controller (RLC), as follows:
sys-admin:~ # ssh r1lead Last login: Thu Jan 26 13:57:53 2012 from admin r1lead:~ # |
Disable the blademond daemon, as follows:
r1lead:~ # service blademond off |
Turn off IRU power for each CMC using the cpower command, as follows:
# PDSH_SSH_ARGS_APPEND="-F /root/.ssh/cmc_config" pdsh -g cmc pfctl off |
Zero out the slot map cache file, as follows:
# PDSH_SSH_ARGS_APPEND="-F /root/.ssh/cmc_config" pdsh -g cmc cp /dev/null /work/net/broadcom_mac_addr_cache |
Reboot the CMC, as follows:
# PDSH_SSH_ARGS_APPEND="-F /root/.ssh/cmc_config" pdsh -g cmc reboot |
Restart blademond from scratch, see “Restarting the blademond Daemon”.
If you boot a compute node with tmpfs, part of the process transfers a root tarball using multicast. This tarball is then expanded. If you see hundreds of "file X has a time in the future" messages, it likely means your hardware clock is not set to system time properly (see “Ensuring Hardware Clock Has the Correct Time”).
Some software distributions do not synchronize the system time to the hardware clock as expected. As a result, the hardware clock may not get synchronized with the system time as it should. At shut down, the system time is copied to the hardware clock, but sometimes this does not happen.
To set all the compute node hardware clocks up properly, perform the following:
Make sure the system admin controller (SAC) and rack leader controller (RLC) have the correct time
Make sure the SAC and RLCs are synchronized with ntp. A SAC can show a message like the following:
ntpd[20489]: synchronized to 128.162.244.1, stratum 2 |
An RLC might show a message like the following:
20 Jan 22:54:14 ntpd[16831]: synchronized to 172.23.0.1, stratum 3 |
Make sure the compute nodes have the correct time. They use ntp broadcast packets but still will display this:
20 Jan 23:05:16 ntpd[4925]: synchronized to 192.168.159.1, stratum 4 |
You can also use a command like the following and view the output:
sys-admin:~ # pdsh -g leader pdsh -g compute date |
Issue the following command to set the hardware clock to the system clock, as follows:
sys-admin:~ # pdsh -g leader pdsh -g hwclock --systohc |
You can run the hwclock without options to confirm the current hardware clock time, as follows:
sys-admin:~ # hwclock Thu 26 Jan 2012 10:57:27 PM CST -0.750431 seconds |
Normally, as you discover RLCs, switchconfig is called automatically and the switch ports associated with the RLC are configured in the special way needed for RLCs, as follows:
Default VLAN 1
Accept rack VLAN packets tagged (rack 1 vlan is vlan101)
Link Aggregation is the bonding mode between the two ports associated with the RLC
If an RLC is moved in the switch or if switchconfig failed during discovery for some reason, you can run switchconfig by hand to configure the switch, as follows:
Certain switch wires rules must be followed in switch configuration, see “Switch Wiring Rules”.
Make sure all management switches are reachable from the system admin controller (SAC).
Find the MAC addresses associated with the RLC interfaces. You can do this by running the following command on the RLC in question:
r1lead:~ # cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 17
Partner Key: 4
Partner Mac Address: b4:0e:dc:37:4f:a7
Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:25:90:38:e5:22
Aggregator ID: 1
Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:25:90:38:e5:23
Aggregator ID: 1 |
| Caution: Because bonded interfaces are in play, you cannot get both MAC addresses from using the ifconfig command . The ifconfig command will show the same MAC address for eth0 and eth1 if redundant management networking is enabled. |
Determine which management switches are present, as follows:
r1lead:~ # cnodes -mgmtsw mgmtsw0 |
When you have the list of management switches and the MAC addresses of the RLCs, run a command similar to the following:
# switchconfig set --vlan num=1 --vlan num=101,tag=tagged --bonding=802.3ad --default-vlan 1 / --macs 00:e0:ed:0a:f2:0d,00:e0:ed:0a:f2:0e --switches mgmtsw0,mgmtsw |
This replaces the MACs and management switches with the proper ones. It replaces the 101 with the VLAN for the rack, normally "100 + rack number" so rack 1 is 101, rack 2 102.
This section is mainly of interest to SGI ICE X system configurations that have a redundant management network setup (stacked pairs of switches) or larger systems that have switch stacks cascaded from the top level switch.
When discovering cascaded switches, it is impossible to know the connected switch ports of all trunks in advance. So when discovering cascaded switches, you can only start with one cable for discovering, then add the second one later on.
When trunks are configured, it is often hard to find the MAC address of both legs of the trunk. This is because the trunked connection just uses one MAC for the connection. Therefore, you need to rely on rules that infer the second port's connection based on the first port.
Some simple wiring rules are, as follows:
In a redundant management network (RMN) configuration, when connecting system admin controllers (SACs), rack leader controllers (RLCs), service nodes, and CMCs, you must always use the same port number for the same node in both switches in the stack. In other words:
If you connect r1lead eth0 to switch A, port 43, then you must connect r1lead eth1 to switch B, port 43.
Likewise, if you connect CMC r1i0c CMC-0 port to switch A, port 2, then r1i0c CMC-1 port must go to switch B port 2.
When adding cascaded switch stacks, all switch stacks must cascade from the primary switch stack. In other words, there is always only, at most, one switch hop.
When discovering cascaded switches pairs in an RMN setup, observe the following:
If you are connecting switch stack 1, switch A, port 48 to switch stack 2, then you must connect the second trunked connection to stack 2, switch B, port 48.
Until the cascaded switch stack is discovered, you must leave one trunk leg unplugged temporarily to prevent looping.
The discover command will tell you when it is safe to plug in the second leg of the trunk. This avoids circuit loops.
A problem occasionally occurs, especially in SGI XE270 SACs, where the active-backup or 802.3ad bonded bond0 interface contains an Ethernet eth2 interface that is down/not linked. To verify this, perform the following:
Check the Ethernet port of the add-in card and confirm that it is lit.
Confirm that the add-in card connection to the management switches is using port 0 with port 1 not connected (so not miswired).
If you look at /proc/net/bonding/bond0 file, you can confirm that eth2 is the link that is down.
Use the /sbin/ethtool eth2 command and confirm that the Link detected: is no.
Run the commands ifconfig up eth3 and the run the /sbin/ethtool eth3 command to determine if the link detected is yes.
In this scenario, it is likely that the eth2/ eth3 interfaces have been swapped. Another clue is that if eth2 (look at /proc/net/bonding/bond0 since the bond enforces the same MAC address for all bonded members) has a MAC address that is larger than the MAC address of eth3 (as seen by ifconfig eth3).
To correct this situation, edit the /etc/udev/rules.d/70-persistent-net.rules file and swap the MACs associated with eth2 and eth3 in the file.
When you reboot the system, the SAC comes back up with eth2 and eth3 properly ordered.
After you upgrade or intstall SGI Tempo 2.9.0 on any slot, the boot manager is changed to GRUB version 2. At this point, you can no longer install SGI Tempo versions earlier than 2.9.0 on any slot. The procedure in this topic explains how to install an earlier SGI Tempo version. For information about booting an SGI Tempo 2.9.0 system, see the following:
“Disk Partition Layout and Boot Order Information” in Chapter 3
The following procedure explains how to install a version of SGI Tempo that is earlier than 2.9.0. After you run this procedure, the boot system is again GRUB version 1. Subsequently, if you install SGI Tempo 2.9.0, the boot system changes again to GRUB version 2.
Procedure 5-2. To install an SGI Tempo version that is earlier than SGI Tempo 2.9.0
Log into the slot upon which you installed SGI Tempo 2.9.0 as the root user.
Type the following command to revert to the GRUB version 1 boot manager on the SAC:
# /opt/sgi/lib/revert-admin-to-legacy-grub |
Use the SGI DVD to install the older version of SGI Tempo that you want to use.