Chapter 1. Features of the Oracle Parallel Server

This chapter introduces the features and capabilities of the Oracle Parallel Server (OPS) system. It contain these major sections:

OPS Hardware and Software Requirements

The Silicon Graphics® OPS system consists of the following hardware components:

  • two same-model CHALLENGE servers (except CHALLENGE S) running IRIX 6.2 or two Origin2000 servers running IRIX 6.4

  • one IRISconsole: either an Indy® workstation running IRIX 6.2 or an O2 workstation running IRIX 6.3, along with an IRISconsole serial port multiplexer, cables, and software

  • one CHALLENGE RAID deskside storage system with two storage-control processors (SPs) configured as RAID level 5 (for the databases)

  • two plexed CHALLENGE Vaults or one CHALLENGE RAID deskside storage system configured as RAID level 1 (for Oracle REDO logs)

  • required hardware upgrades and cables

The required software for and OPS system consists of the follow components:

  • IRIX 6.2 with patches on CHALLENGE systems or IRIX 6.4 with patches on Origin2000 systems

  • Release 1.2 of OPS software for Silicon Graphics systems on the CHALLENGE or Origin systems

  • IRISconsole software on the Indy or O2 workstation

  • software for the component systems, such as the CHALLENGE RAID storage system

  • Release 7.3.2 or later of the Oracle RDBMS (obtained from Oracle Corporation)

  • the Parallel Server Option (obtained from Oracle Corporation)

Optional software includes the following:

  • Performance Co-Pilot (PCP)

  • Database Accelerator (DBA)

  • IRIX NetWorker

OPS Configuration

An OPS system is a collection of Oracle instances running on separate CHALLENGE or Origin servers, providing simultaneous access to the same physical database. The physical database is the same as that for an ordinary non-OPS (nonparallel) Oracle RDBMS, except that it has separate redo logs and rollback segments for each instance. The redo log file is a compressed record of changes that a transaction has made.

For Silicon Graphics systems, the OPS system is available in a dual-host configuration; each server can access the same shared disk storage. The servers can be two CHALLENGE DM, CHALLENGE L, CHALLENGE XL, or Origin2000 systems. A third system, an Indy or O2 workstation running the IRISconsole software, functions as the OPS Node Controller and as a single point of administration for the OPS system. Figure 1-1 diagrams Silicon Graphics' OPS hardware configuration.

Figure 1-1. OPS Hardware Configuration

Besides the IRISconsole software, the IRISconsole workstation also runs the OPS node control software, opsnc. The OPS Node Controller command opsnc implements a fail-stop mechanism: in the event of a private network partition (a private network failure that results in the OPS instances being isolated from each other), only one OPS instance is permitted to continue providing service. The other instance is forced to crash and must be restarted by the system administrator.

Each OPS instance consists of the following software components:

  • Oracle RDBMS processes, including PMON, SMON, DBWR, LGWR

  • OPS Distributed Lock Manager (DLM) processes: dlmmon and dlmd

    An OPS system allows multiple servers access to the same shared physical database. The Distributed Lock Manager has the functionality that enables sharing.

  • OPS Connection Manager (CM) process: opscm

    The OPS Connection Manager opscm implements a heartbeat protocol across both servers to detect server and private network failures, monitors local DLM processes to detect lock manager failure, and provides a sync service to coordinate recovery for server failure and reintegration.

Figure 1-2 diagrams OPS software configuration.

Figure 1-2. OPS Software Configuration

OPS Instances and Domains

An Oracle instance consists of a system global area (SGA) and a set of server processes that access the physical database located on disk. The SGA is a section of shared memory accessed by each of the OPS server processes in an instance. In an OPS system, multiple Oracle instances constitute one domain. Each instance has its own SGA, server processes, redo log files, rollback segments, Distributed Lock Manager (DLM) processes, and Connection Manager (CM) process. All instances in the domain access the same physical database.

DLM domains are numbered starting with 0; DLM instances are numbered 0 and 1. Figure 1-3 diagrams an OPS configuration with a single domain.

Figure 1-3. One Domain on an OPS Configuration

In the configuration shown in Figure 1-3, the instances in the DLM domain are

  • 0,0 (DLM domain 0, domain instance 0)

  • 0,1 (DLM domain 0, domain instance 1)

Figure 1-4 diagrams an OPS configuration with two domains. In this configuration, the instances in the DLM domain are

  • 0,0 (DLM domain 0, domain instance 0)

  • 0,1 (DLM domain 0, domain instance 1)

  • 1,0 (DLM domain 1, domain instance 0)

  • 1,1 (DLM domain 1, domain instance 1)

In any of these cases, all Oracle instances that use the same database must use the same domain.

Figure 1-4. OPS Configuration With Two Domains

OPS Architectural Features

The major architectural features of the OPS system are

  • high availability

    High availability is provided at multiple levels:

    • If a server fails, the database is still accessible from the surviving server.

    • If the CHALLENGE RAID storage system is used, RAID-5 provides tolerance to any single point of failure within the RAID.

    • Each redo log file can be mirrored, so that an instance can survive failure of a log file.

  • consolidation of database administration, using the workstation as a node controller

  • high performance

    An OPS system utilizes the full power of CHALLENGE and Origin system memory and its high-speed system bus performance. Operating system enhancements include changes to virtual memory for more efficient multiprocessing, raw I/Os, multi-process networking, and process scheduling. Besides these enhancements, IRIX already supports real-time scheduling, CPU affinity, and, for the 64-bit operating system, CPU partitioning (the ability to steer interrupts to specific CPUs), which are critical for DBMS performance.

  • distributed locks

    Row-level locking, the finest level of locking granularity, minimizes the amount of data contention between transactions and maximizes concurrency. Oracle Parallel Server extends this feature by allowing multiple transactions on different servers to lock and update different rows of any table in the database.

    Row-level locking is independent of the parallel cache manager's use of distributed lock, which is used to keep the SGAs consistent with each other. Row-level locking is achieved by the Oracle RDBMS internal concurrency control architecture. For distributed locking, the parallel cache manager uses a special background process, the LCNO process, which requests locks from the Distributed Lock Manager (DLM). The DLM is not used for row-level locking; thus its use is minimized and performance is enhanced.

For more information on Oracle RDBMS and OPS operation, consult Oracle documentation.

OPS Components and How They Work Together

The OPS system allows Oracle7 instances running on the two servers to access a common Oracle database. This design allows users on multiple systems seamless access to common data, so that more computing resources are available to all applications that access the same database.

The OPS system is designed to allow any servers in the cluster to be brought down, either voluntarily or involuntarily, without interrupting access to the database from the other servers. If an Oracle instance or a server fails, users from the failed server can migrate to the other running server and reconnect to the database.

Figure 1-5 diagrams an example OPS installation with storage systems.

Figure 1-5. Example OPS Site With Two Vaults and One CHALLENGE RAID Storage System

The rest of this section describes specific components of the OPS system:

  • IRISconsole

  • CHALLENGE RAID storage system

  • XFS filesystem

  • Database Accelerator (DBA)

Role of IRISconsole

For the OPS system, IRISconsole is made up of the following:

  • IRISconsole software, including a graphical user interface, running under IRIX 6.2 (Indy) or IRIX 6.3 (O2)

  • an IRISconsole multiplexer, including cabling connecting the workstation to the multiplexer

  • a pair of serial cables included in the IRISconsole package, plus one additional pair, for connecting the two OPS servers to the multiplexer

The IRISconsole software monitors each OPS server through the server's Remote System Control and console ports using a serial connection to the serial port server (multiplexer). If a server fails, IRISconsole can automatically start procedures defined by the OPS system administrator in addition to the failover procedures provided for in the OPS software.

Note: For full OPS system and IRISconsole functionality, the Remote System Control ports on the Origin2000 MMSC or MSC system controller and the System Console ports on the CHALLENGE DM, L, or XL servers must be cabled to ports on the multiplexer.

The IRISconsole software enables the administrator to

  • display, view, or take control of the console of an OPS server (or other attached system)

  • view console activity logs and other system reports

  • view real-time graphs of hardware operating statistics of an OPS server, such as voltage, operating temperature, and blower speeds; save the graphs as files and display them (for CHALLENGE systems only)

  • set a threshold for operating statistics so that an alarm is activated when the threshold is reached and various activities can be triggered (for CHALLENGE systems only)

For complete information on the IRISconsole, see the documentation:

  • IRISconsole Administrator's Guide (007-2872-nnn)

  • IRISconsole Multiplexer Installation Guide (007-2839-nnn)

Role of the CHALLENGE RAID Storage System

The CHALLENGE RAID (Redundant Array of Independent Disks) storage system provides a compact, high-capacity, high-availability source of disk storage for the OPS system in the form of multiple disk drive modules that you can replace when the storage system is powered on (hot-replaceable modules). Each CHALLENGE RAID storage system supports from five to twenty disk modules.

The CHALLENGE RAID storage system used for databases supports RAID level 5: a group of disk modules is bound together into a logical unit (LUN). A RAID–5 group maintains parity data that lets the disk group survive a disk module failure without losing data. In addition, in a CHALLENGE RAID storage system configured for an OPS system, the RAID-5 group can survive a single SCSI–2 internal bus failure, because each disk module in the group is bound on an independent SCSI–2 internal bus.

Through the storage-control processors (SPs), the SCSI–2 bus is split into five internal fast/narrow SCSI buses—A, B, C, D, and E—that connect the slots for the disk modules. For example, internal bus A connects the modules in slots A0, A1, A2, and A3, in that order. Figure 1-6 diagrams this configuration.

For an OPS system, the CHALLENGE RAID storage system must have two SPs. Each SP controls disk modules in a LUN. The second processor provides a second path to the disk modules as part of the failover strategy of the OPS system; see Figure 1-6. Each LUN is controlled by one of the SPs. The non-controlling SP takes over a LUN if its controlling SP fails.

In addition, both SPs are required for storage system caching to work: each processor temporarily stores modified data in its memory and writes the data to disk at the most expedient time.

Figure 1-6. SCSI–2 Bus and Internal Buses (Front View)

For complete information on the CHALLENGE RAID storage system, see the CHALLENGE RAID Owner's Guide (007-2532-nnn).

Role of the XFS Filesystem

XFS is a journaled filesystem that allows for extremely fast recovery time of filesystem structures during reboot. Recovery of XFS filesystems is independent of filesystem size. For this reason, XFS is particularly useful for OPS operation.

On a traditional UNIX® filesystem, a full filesystem check takes an amount of time proportional to the size of the filesystem. On XFS, the recovery time is in seconds, because it is dependent upon system activity level, rather than filesystem size. Using XFS reduces the time required to bring a failed server back online.

For complete information on the XFS filesystem, see the guide IRIX Admin: Disks and Filesystems (007-2825-nnn). This document is viewable in IRIS InSight.

Role of DBA (Database Accelerator)

The Database Accelerator (DBA) consists of kernel enhancements designed to boost performance specifically for Oracle. These kernel enhancements can help double the performance of write-intensive benchmarks, such as TPC-[AB], or building very large indexes for real-life applications. The kernel enhancements are as follows:

  • Postwait driver, a kernel software driver, provides very fast multithreaded synchronization mechanism for Oracle processes. It replaces the standard SVR4 mechanism of semaphore, which is too slow for the high TPS rate.

  • Kernel list I/O, an IRIX enhancement, allows the Oracle database writer to flush modified buffers to disks efficiently: a single Oracle database writer can flush at least 2000 buffers per second to disk drives. With only one system call, the database writer can initiate multiple writes to all disk drives in the system. Without this functionality, Oracle database writer performance would have to use shadow processes, thus incurring the overhead of process synchronization; another limitation would be the single-threaded nature of making one system call per disk write.

Optional Silicon Graphics Software and OPS Performance

This section briefly explains how the following optional Silicon Graphics software products can enhance OPS performance:

  • PCP (Performance Co-Pilot)

  • IRIX NetWorker

Performance Co-Pilot (PCP)

Performance Co-Pilot (PCP) provides a suite of tools for performance monitoring and performance management services across the spectrum of performance domains—hardware platforms, the operating systems, the DBMS, and the applications.

PCP runs in a client/server configuration: PCP agents (clients) monitor domains and send information to the PCP server, which graphically displays the information on the workstation. PCP can be used to monitor Oracle and system activity on both servers in the OPS system.

IRIX NetWorker

IRIX NetWorker reliably protects files against loss across an entire network of systems. NetWorker saves valuable administrator time by speeding and simplifying daily backup operations. As NetWorker backs up data, it creates a database of the saved data, making it easy to locate a file for recovery. Furthermore, as the network and number of files expand, NetWorker has the capacity and performance to handle the load.

IRIX NetWorker includes extended support for autochangers (jukeboxes and tape libraries), and archiving and retrieval capability. Its ability to back up raw files makes it particularly suitable for use with an OPS system, because all Oracle files are XLV raw devices.

For complete information on NetWorker, see the documentation:

  • IRIX NetWorker Administrator's Guide (007-1458-nnn)

  • IRIX NetWorker User's Guide (007-2304-nnn)

These documents are viewable in IRIS InSight.