Chapter 2. Configuring, Starting, and Halting the OPS Software

This chapter explains how to configure, start, and halt the OPS system. It contains these major sections:

Checking OPS Software Configuration

The OPS software is installed and configured during installation of the OPS system. This section explains how you can check that the configuration files created during this process contain the correct information.

The OPS configuration files that are edited or created during OPS system setup are

  • /usr/opscm/conf/opsconf on each server

  • /usr/opscm/conf/sidconf on each server

  • /usr/opsnc/conf/ncconf on the Indy or O2 workstation

Checking each file is explained separately in a subsection.

Checking the opsconf Configuration File

To check the contents of the opsconf configuration file, follow these steps:

  1. Open /usr/opscm/conf/opsconf on each server.

  2. Check to see that the line

    CLUSTER   1    test     2      ichostname    
    

    has the correct hostname of the workstation used for IRISconsole.

  3. Define the Distributed Lock Manager (DLM) domain(s) and DLM instances, as explained in the section “OPS Instances and Domains” in Chapter 1. Check that the lines under

    #NODE dom inst ndname ndaddress      cmsvc   apsvc   wt
    

    have accurate information on your OPS servers. For example:

    NODE  0   0    host1  150.166.42.37  opscm   opsdlm  1
    NODE  0   1    host2  150.166.42.38  opscm   opsdlm  1
    

  4. Save and close /usr/opscm/conf/opsconf.

  5. Edit the file /etc/services. Add three services:

    opscm    newnumber1/tcp
    opsdlm   newnumber2/tcp
    opsnc    newnumber3/tcp
    

    newnumbern is a number not used elsewhere at this site. For example:

    opscm    7018/tcp
    opsdlm   7019/tcp
    opsnc    7020/tcp
    

  6. Save and exit the file.

Checking the sidconf Configuration File

The sidconf file maps each Oracle instance (sid, or Oracle system ID) to a DLM domain-instance pair. To check the sidconf configuration file, follow these steps:

  1. Determine the sid of the Oracle database for each instance.

  2. In /usr/opscm/conf/sidconf, check that the lines

    MAP sid0 domainnumber instancenumber 
    MAP sid1 domainnumber instancenumber 
    

    contain accurate information for the servers. For example:

    MAP finance1 0 0
    MAP finance2 0 1
    

  3. Save and close /usr/opscm/conf/sidconf.

  4. If necessary, change permissions on this file so that it can be read by all.

Checking the ncconf Configuration File

To check the ncconf configuration file, follow these steps:

  1. Note the ports on the multiplexer to which the Remote System Control server ports are cabled.

  2. In an IRIX shell window on the workstation, open the /usr/opsnc/conf/ncconf file.

  3. Check that the entries under

    #nodename    ttyname     system controller type
    

    contain accurate information for the servers. For example:

    ops1     /dev/ttyd031     SYSCTLR|MMSC|MSC
    ops2     /dev/ttyd033     SYSCTLR
    

    The last digit in each tty entry should be the multiplexer port into which the Remote System Control for each server is cabled. The numbers 1 through 16 on the multiplexer correspond to 0 through f in the tty entries. In the example above, the Remote System Control ports are cabled to ports 2 and 4 on the multiplexer. SYSCTLR is used if the system is a CHALLENGE server, MMSC is used if the system is a rackmount Origin2000 server, and MSC is used if the system is a deskside Origin2000.

  4. Save and close /usr/opsnc/conf/ncconf.

Creating System Controller Password Files

If the OPS servers are Origin servers and one or both of them have system controller passwords, the passwords must be stored in the /usr/opsnc/conf/.sysctlrpw-* files on the Indy or O2 workstation. This is required for the Node Controller software to work correctly.

For each OPS server that has a system controller password, create a .syscltrpw file by entering this command as root on the Indy or O2 workstation:

/usr/opsnc/bin/spng -n OPS_hostname -d sys_ctlr_type -w password 

OPS_hostname is the hostname of an OPS server. syst_ctlr_type is the OPS server's system controller type, either MSC or MMSC. password is the unencrypted system controller password.

Starting the Node Controller Software on the Indy or O2 Workstation

To start the Node Controller software (opsnc) on the Indy or O2 workstation, enter this command as root:

/usr/opsnc/bin/opsnc 

You must restart the Node Controller software using this command each time the workstation is rebooted.

Starting the OPS Software Manually on the Servers

This section describes how it start the OPS software manually on the servers. You may want to do this while testing the software or before you set up the servers to run the software automatically.

The OPS software consists of the Connection Manager (CM) software (opscm) and Oracle Distributed Lock Manager (DLM) software (dlmmon and dlmd). Follow these steps to start them manually:

  1. Follow the procedure in the section “Halting the OPS Software Manually” in this chapter to ensure that the CM and DLM software aren't already running.

  2. For each server, create a startup script containing the following lines:

    #!/sbin/sh
     
    ORACLE_HOME=/usr/people/oracle
    ORACLE_SID=sidname 
    LKDOM=0 
    LKINST=0 
    USER=oracle 
    GROUP=dba
     
    export ORACLE_HOME ORACLE_SID LKDOM LKINST
     
    /usr/opscm/bin/opscm 
    $ORACLE_HOME/bin/lkmgrd -u $USER -g $GROUP 
    

    In each script, make sure that the values for LKDOM= and LKINST= are accurate for the domain and instance on that server. Theses values must match those in /usr/opscm/conf/sidconf, as explained in the section “Checking the sidconf Configuration File” in this chapter. lkmgrd is the Distributed Lock Manager daemon. Its options specify the user ID and group of lock database owner.

  3. As root, run the startup script you created in step 2 on each server to bring up the CM and DLM software on each server.

Starting the OPS Software Automatically on the Servers

To enable the OPS software (Connection Manager and Oracle Distributed Lock Manager) to start automatically at boot time and to start it, follow these steps on each server:

  1. Edit the /etc/init.d/opscmgr script. This script is similar to the startup script created in step 2 in the section “Starting the OPS Software Manually on the Servers” in this chapter.

  2. Run this command as root:

    chkconfig -f opscm on 
    

  3. Reboot the server. For example:

    reboot 
    

Starting a Server for Single-Host Operation

Occasionally, you may need to run the Connection Manager software with just one server operating, for example, while one server is turned off. To start a server for single-host operation, run the opscm command with the -F option:

/usr/opscm/bin/opscm -F 


Note: Do not use the -F option for normal OPS dual-host operation.


Halting the OPS Software on the Servers

To halt the Connection Manager and Oracle Distributed Lock Manager Software on both servers, follow these steps:

  1. Back up Oracle database information as needed.

  2. On each OPS server, halt Oracle database operation.

  3. Enter the following command to terminate the Connection Manager gracefully:

    killall -TERM opscm 
    

Halting the OPS Software Manually

If you are not able to gracefully halt the Connection Manager and Oracle Distributed Lock Manager Software on both servers as described in the previous section, use this procedure to halt the processes and clean up. Follow these steps on each server:

  1. Check for the presence of the CM lock file in /tmp. This filename has the suffix.nn, where each number stands for the DLM domain and instance, for example, .00. If this file exists, delete it.

  2. Check to see if the CM is already running by entering

    ps -ef | grep opscm 
    

  3. If the CM is running, enter the following to kill it:

    killall -TERM opscm 
    

  4. Check to see if the DLM is already running by entering

    ps -ef | grep dlm 
    

  5. If the DLM is running, enter the following to kill its processes:

    killall -TERM dlmmon 
    killall -TERM dlmd 
    

  6. Run ipcs to determine the shared memory and semaphores used on the server.

    ipcs 
    

    The following is an example output:

    IPC status from /dev/kmem as of Thu May 18 14:31:22 1995
    T     ID     KEY        MODE       OWNER    GROUP
    Message Queues:
    Shared Memory:
    m      0 0x53637444 --rw-r--r--     root      sys
    m    301 0x000022bb --rw-rw----   oracle      dba
    m   2202 0x0c33b7c9 --rw-r-----   oracle      dba
    Semaphores:
    s   2200 0x00000000 --ra-r-----   oracle      dba
    

  7. If Oracle or DLM is using any shared memory segments or semaphores, save them to another location if you need them for debugging a DLM or Oracle crash; otherwise, delete them with ipcrm. In the example in step 6, you would use

    ipcrm -m 301 -m 2202 -s 2200 
    

Checking Log Files

You can examine log and control files to learn about failures. The log and control files that are relevant for OPS failures are

/var/adm/SYSLOG  


Node Controller and Connection Manager log information

/var/tmp/dlm  


Distributed Lock Manager log