When you install the CXFS software, you must modify certain system files. The network configuration is critical. Each node in the cluster must be able to communicate with every other node in the cluster by both logical name and IP address without going through any other network routing; proper name resolution is key. SGI recommends static routing.
This section provides an overview of the steps that you should perform on your nodes prior to installing the CXFS software. It contains the following sections:
| Caution: It is critical that you understand these rules before attempting to configure a CXFS cluster. |
Use the following hostname resolution rules and recommendations when defining a node:
The first node you define in the pool must be an administration node.
Hostnames cannot begin with an underscore (_) or include any white-space characters.
The private network IP addresses on a running node in the cluster cannot be changed while CXFS services are active.
You must be able to communicate directly between every node in the cluster (including client-only nodes) using IP addresses and logical names, without routing.
You must dedicate a private network for control messages, CXFS metadata, CXFS kernel heartbeat messages, and cluster database heartbeat messages. No other load is supported on this network.
The private network must be connected to all nodes, and all nodes must be configured to use the same subnet for that network.
Because CXFS kernel heartbeat and cluster database heartbeat are done using IP multicast, the private network must be multicast-capable. This means that all of the interfaces must have multicast enabled (which is the default) and all of the external networking hardware (such as switches) must support IP multicast.
If you change hostname resolution settings in the /etc/nsswitch.conf file after you have defined the first administration node (which creates the cluster database), you must re-create the cluster database.
The following procedure provides an overview of the steps required to add a private network.
| Note: A private network is required for use with CXFS. |
You may skip some steps, depending upon the starting conditions at your site.
Ensure that all hosts in the cluster have a consistent mapping of hostnames to IP addresses, using one of the following methods:
Edit the /etc/hosts file on every node in the cluster so that it contains consistent entries for all nodes in the cluster and their private interfaces (preferred)
Use a reliable domain name service (DNS) implementation that is configured to quickly rotate to a secondary server, using IP addresses in an HA cluster
See “Ensure Quick Communication Among Cluster Nodes” in Chapter 2.
Configure your private interface according to the instructions in the network configuration section of your Linux distribution manual. To verify that the private interface is operational, use the ifconfig -a command.
For example:
server-admin# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:50:81:A4:75:6A
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13782788 errors:0 dropped:0 overruns:0 frame:0
TX packets:60846 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:826016878 (787.7 Mb) TX bytes:5745933 (5.4 Mb)
Interrupt:19 Base address:0xb880 Memory:fe0fe000-fe0fe038
eth1 Link encap:Ethernet HWaddr 00:81:8A:10:5C:34
inet addr:10.0.0.10 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:19 Base address:0xef00 Memory:febfd000-febfd038
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:162 errors:0 dropped:0 overruns:0 frame:0
TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:11692 (11.4 Kb) TX bytes:11692 (11.4 Kb) |
This example shows that two Ethernet interfaces, eth0 and eth1, are present and running (as indicated by UP in the third line of each interface description).
If the second network does not appear, it may be that a network interface card must be installed in order to provide a second network, or it may be that the network is not yet initialized.
(Optional) Make the modifications required to use CXFS connectivity diagnostics. See “Modifications for CXFS Connectivity Diagnostics” in Chapter 7.
For each private network on each server-capable administration node in the pool, verify access with the ping command:
Execute a ping using the private network. Enter the following, where nodeIPaddress is the IP address of the node:
ping nodeIPaddress |
For example:
server-admin# ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1) from 128.162.240.141 : 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.310 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.122 ms 64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.127 ms |
Execute a ping using the public network.
If ping fails, repeat the following procedure on each node:
Verify that the network interface was configured up using ifconfig. For example:
server-admin# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:81:8A:10:5C:34
inet addr:10.0.0.10 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:19 Base address:0xef00 Memory:febfd000-febfd038 |
In the third output line above, UP indicates that the interface was configured up.
Verify that the cables are correctly seated.
Repeat this procedure on each node.
In order to test node connectivity by using the GUI, the root user on the node running the CXFS diagnostics must be able to access a remote shell using the rsh command (as root ) on all other nodes in the cluster. (This test is not required when using cxfs_admin because it verifies the connectivity of each node as it is added to the cluster.)
There are several ways of accomplishing this, depending on the existing settings in the pluggable authentication modules (PAMs) and other security configuration files.
The following method works with default settings. Do the following on all nodes in the cluster:
After you have completed running the connectivity tests, you may wish to disable rsh on all cluster nodes.
For more information, see the Linux operating system documentation about PAM and the hosts.equiv man page.