This chapter explains how to configure, start, and halt the OPS system. It contains these major sections:
The OPS software is installed and configured during installation of the OPS system. This section explains how you can check that the configuration files created during this process contain the correct information.
The OPS configuration files that are edited or created during OPS system setup are
/usr/opscm/conf/opsconf on each server
/usr/opscm/conf/sidconf on each server
/usr/opsnc/conf/ncconf on the Indy or O2 workstation
Checking each file is explained separately in a subsection.
To check the contents of the opsconf configuration file, follow these steps:
Open /usr/opscm/conf/opsconf on each server.
Check to see that the line
CLUSTER 1 test 2 ichostname |
has the correct hostname of the workstation used for IRISconsole.
Define the Distributed Lock Manager (DLM) domain(s) and DLM instances, as explained in the section “OPS Instances and Domains” in Chapter 1. Check that the lines under
#NODE dom inst ndname ndaddress cmsvc apsvc wt |
have accurate information on your OPS servers. For example:
NODE 0 0 host1 150.166.42.37 opscm opsdlm 1 NODE 0 1 host2 150.166.42.38 opscm opsdlm 1 |
Save and close /usr/opscm/conf/opsconf.
Edit the file /etc/services. Add three services:
opscm newnumber1/tcp opsdlm newnumber2/tcp opsnc newnumber3/tcp |
newnumbern is a number not used elsewhere at this site. For example:
opscm 7018/tcp opsdlm 7019/tcp opsnc 7020/tcp |
Save and exit the file.
The sidconf file maps each Oracle instance (sid, or Oracle system ID) to a DLM domain-instance pair. To check the sidconf configuration file, follow these steps:
Determine the sid of the Oracle database for each instance.
In /usr/opscm/conf/sidconf, check that the lines
MAP sid0 domainnumber instancenumber MAP sid1 domainnumber instancenumber |
contain accurate information for the servers. For example:
MAP finance1 0 0 MAP finance2 0 1 |
Save and close /usr/opscm/conf/sidconf.
If necessary, change permissions on this file so that it can be read by all.
To check the ncconf configuration file, follow these steps:
Note the ports on the multiplexer to which the Remote System Control server ports are cabled.
In an IRIX shell window on the workstation, open the /usr/opsnc/conf/ncconf file.
Check that the entries under
#nodename ttyname system controller type |
contain accurate information for the servers. For example:
ops1 /dev/ttyd031 SYSCTLR|MMSC|MSC ops2 /dev/ttyd033 SYSCTLR |
The last digit in each tty entry should be the multiplexer port into which the Remote System Control for each server is cabled. The numbers 1 through 16 on the multiplexer correspond to 0 through f in the tty entries. In the example above, the Remote System Control ports are cabled to ports 2 and 4 on the multiplexer. SYSCTLR is used if the system is a CHALLENGE server, MMSC is used if the system is a rackmount Origin2000 server, and MSC is used if the system is a deskside Origin2000.
Save and close /usr/opsnc/conf/ncconf.
If the OPS servers are Origin servers and one or both of them have system controller passwords, the passwords must be stored in the /usr/opsnc/conf/.sysctlrpw-* files on the Indy or O2 workstation. This is required for the Node Controller software to work correctly.
For each OPS server that has a system controller password, create a .syscltrpw file by entering this command as root on the Indy or O2 workstation:
/usr/opsnc/bin/spng -n OPS_hostname -d sys_ctlr_type -w password |
OPS_hostname is the hostname of an OPS server. syst_ctlr_type is the OPS server's system controller type, either MSC or MMSC. password is the unencrypted system controller password.
To start the Node Controller software (opsnc) on the Indy or O2 workstation, enter this command as root:
/usr/opsnc/bin/opsnc |
You must restart the Node Controller software using this command each time the workstation is rebooted.
This section describes how it start the OPS software manually on the servers. You may want to do this while testing the software or before you set up the servers to run the software automatically.
The OPS software consists of the Connection Manager (CM) software (opscm) and Oracle Distributed Lock Manager (DLM) software (dlmmon and dlmd). Follow these steps to start them manually:
Follow the procedure in the section “Halting the OPS Software Manually” in this chapter to ensure that the CM and DLM software aren't already running.
For each server, create a startup script containing the following lines:
#!/sbin/sh ORACLE_HOME=/usr/people/oracle ORACLE_SID=sidname LKDOM=0 LKINST=0 USER=oracle GROUP=dba export ORACLE_HOME ORACLE_SID LKDOM LKINST /usr/opscm/bin/opscm $ORACLE_HOME/bin/lkmgrd -u $USER -g $GROUP |
In each script, make sure that the values for LKDOM= and LKINST= are accurate for the domain and instance on that server. Theses values must match those in /usr/opscm/conf/sidconf, as explained in the section “Checking the sidconf Configuration File” in this chapter. lkmgrd is the Distributed Lock Manager daemon. Its options specify the user ID and group of lock database owner.
As root, run the startup script you created in step 2 on each server to bring up the CM and DLM software on each server.
To enable the OPS software (Connection Manager and Oracle Distributed Lock Manager) to start automatically at boot time and to start it, follow these steps on each server:
Edit the /etc/init.d/opscmgr script. This script is similar to the startup script created in step 2 in the section “Starting the OPS Software Manually on the Servers” in this chapter.
Run this command as root:
chkconfig -f opscm on |
Reboot the server. For example:
reboot |
Occasionally, you may need to run the Connection Manager software with just one server operating, for example, while one server is turned off. To start a server for single-host operation, run the opscm command with the -F option:
/usr/opscm/bin/opscm -F |
![]() | Note: Do not use the -F option for normal OPS dual-host operation. |
To halt the Connection Manager and Oracle Distributed Lock Manager Software on both servers, follow these steps:
Back up Oracle database information as needed.
On each OPS server, halt Oracle database operation.
Enter the following command to terminate the Connection Manager gracefully:
killall -TERM opscm |
If you are not able to gracefully halt the Connection Manager and Oracle Distributed Lock Manager Software on both servers as described in the previous section, use this procedure to halt the processes and clean up. Follow these steps on each server:
Check for the presence of the CM lock file in /tmp. This filename has the suffix.nn, where each number stands for the DLM domain and instance, for example, .00. If this file exists, delete it.
Check to see if the CM is already running by entering
ps -ef | grep opscm |
If the CM is running, enter the following to kill it:
killall -TERM opscm |
Check to see if the DLM is already running by entering
ps -ef | grep dlm |
If the DLM is running, enter the following to kill its processes:
killall -TERM dlmmon killall -TERM dlmd |
Run ipcs to determine the shared memory and semaphores used on the server.
ipcs |
The following is an example output:
IPC status from /dev/kmem as of Thu May 18 14:31:22 1995 T ID KEY MODE OWNER GROUP Message Queues: Shared Memory: m 0 0x53637444 --rw-r--r-- root sys m 301 0x000022bb --rw-rw---- oracle dba m 2202 0x0c33b7c9 --rw-r----- oracle dba Semaphores: s 2200 0x00000000 --ra-r----- oracle dba |
If Oracle or DLM is using any shared memory segments or semaphores, save them to another location if you need them for debugging a DLM or Oracle crash; otherwise, delete them with ipcrm. In the example in step 6, you would use
ipcrm -m 301 -m 2202 -s 2200 |