This chapter provides an overview of the real-time support in IRIX and REACT/Pro.
Some of the features mentioned here are discussed in more detail in the following chapters of this guide. For details on other features, you are referred to reference pages or to other manuals. The main topics surveyed are
“Kernel Facilities for Real-Time Programs,” including special scheduling disciplines, isolated CPUs, and locked memory pages
“REACT/Pro Frame Scheduler,” which takes care of the details of scheduling multiple processes on multiple CPUs at guaranteed rates
“Interprocess Communication,” reviewing the ways that a concurrent, multiprocess program can coordinate its work
“Timers and Clocks,” reviewing your options for time-stamping and interval timing
“Interchassis Communication,” reviewing two ways of connecting multiple chassis
The IRIX kernel has been optimized for performance in a multiprocessor environment. Some of the optimizations are as follows:
Instruction paths to system calls and traps are optimized, including some hand coding, to maximize cache utilization.
In the real-time dispatch class (described further in “Using Priorities and Scheduling Queues”), the run queue is kept in priority-sorted order for fast dispatching.
Floating point registers are saved only if the next process needs them, and restored only if saved.
The kernel tries to redispatch a process on the same CPU where it most recently ran, in hopes of finding some of its data remaining in cache (see “Understanding Affinity Scheduling”).
The default IRIX scheduling algorithm is designed to ensure fairness among time-shared users. Called an “earnings-based” scheduler, the kernel credits each process group with a certain number of microseconds on each dispatch cycle. The process with the fattest “bank account” is dispatched first. If a process exhausts its “bank account” it is preempted.
While effective for its purpose, the earnings-based scheduler is not suitable for a real-time process, which cannot tolerate being preempted for any reason. The kernel supports a range of priorities that are higher than the time-sharing priorities, and which are not subject to “earnings” controls.
When your program is structured as a process group, you can request that all the processes of the group be scheduled as a “gang.” The kernel runs all the members of the gang concurrently, provided there are enough CPUs available to do so. This helps to ensure that, when members of the process group coordinate through the use of locks, a lock will usually be released in a timely manner. Without gang scheduling, the process that holds a lock might not be scheduled in the same interval as another process that is waiting on that lock.
For more information, see “Using Gang Scheduling”.
This allows you to protect a process from the unpredictable delays caused by paging. Of course the locked memory is not available for the address spaces of other processes. The system must have enough physical memory to hold the real-time address space plus space for a minimum of other activities.
The system calls used to lock memory are discussed in detail in the manual Topics in IRIX Programming (see “Other Useful Books”).
Normally IRIX tries to keep all CPUs busy, dispatching the next ready process to the next available CPU. (This simple picture is complicated by the needs of affinity scheduling, and gang scheduling). Since the number of ready processes changes all the time, dispatching is a random process. A normal process cannot predict how often or when it will next be able to run. For normal programs this does not matter, as long as each process continues to run at a satisfactory average rate.
Real-time processes cannot tolerate this unpredictability. To reduce it, you can dedicate one or more CPUs to real-time work. There are two steps:
Restrict one or more CPUs from normal scheduling, so that they can run only the processes that are specifically assigned to them.
Assign one or more processes to run on the restricted CPUs.
A process on a dedicated CPU runs when it needs to run, delayed only by interrupt service and by kernel scheduling cycles (if scheduling is enabled on that CPU). For details, see “Assigning Work to a Restricted CPU”. The REACT/Pro Frame Scheduler takes care of both steps automatically; see “REACT/Pro Frame Scheduler”.
I/O interrupts from devices attached to, or near, that CPU.
A scheduling clock causes an interrupt to every CPU every time-slice interval of 10 milliseconds.
Whenever interval timers are in use (“Timers and Clocks”), a CPU handling timers receives frequent timer interrupts.
When the map of virtual to physical memory changes, a TLB interrupt is broadcast to all CPUs.
These interrupts can make the execution time of a process unpredictable. However, you can designate one or more CPUs for real-time use, and keep interrupts of these kinds away from those CPUs. The system calls for interrupt control are discussed at more length under “Minimizing Overhead Work”. The REACT/Pro Frame Scheduler also takes care of interrupt isolation.
The REACT/Pro Frame Scheduler is a process execution manager that schedules processes on one or more CPUs in a predefined, cyclic order. The scheduling interval is determined by a repetitive time base, usually a hardware interrupt.
Many real-time programs must sustain a fixed frame rate. In such programs your central design problem is that the program must complete certain activities during every frame interval. When there is more to do in a frame than one CPU can do, some activities must run concurrently on multiple CPUs.
Besides designing the activities themselves, you must design a way to schedule and initiate activities in sequence, once per frame, on multiple CPUs. This is what the REACT/Pro Frame Scheduler does: executes the multiple processes of your real-time program, in sequence, on one or more CPUs.
a specific interval in microseconds
the Vsync (vertical retrace) interrupt from the graphics subsystem
an external interrupt (see “External Interrupts”)
a device interrupt from a specially-modified device driver
a software call (normally used for debugging)
The interrupts from the time base define minor frames. You choose the fixed number of minor frames that make a major frame, as shown in Figure 2-1.
The Frame Scheduler keeps a queue of processes for each minor frame. It dispatches each process once in its scheduled turn. The process runs until it finishes its work; then it yields.
In the simplest case, you have a single frame rate, such as 60 Hz, and every activity your program does must be done once per frame. In this case, the major and minor frame rates are the same.
In other cases, you have some activities that must be done in every minor frame, but you also have activities that are done less often, in every other minor frame or in every third one. In these cases you define the major frame so that its rate is the rate of the least-frequent activity. The major frame contains as many minor frames as necessary to schedule activities at their relative rates.
Sometimes what is here called a “major frame” is called a “process cycle.”
The Frame Scheduler makes it easy for you to organize a real-time program as a set of independent, cooperating processes. The Frame Scheduler manages the housekeeping details of reserving and isolating CPUs. You concentrate on designing the activities and implementing them as processes in a clean, structured way. It is relatively easy to change the number of activities, or their sequence, or the number of CPUs, even late in the project.
Partition the program into activities, where each activity is an independent piece of work that can be done without interruption.
For example, in a simple vehicle simulator, activities might include “poll the joystick,” “update the positions of moving objects,” “cull the set of visible objects,” and so forth.
Decide the relationships among the activities:
Some must be done once per minor frame, others less frequently.
Some must be done before or after others.
Some may be conditional. For example, an activity could poll a semaphore and do nothing unless an event had completed.
Estimate the worst-case time required to execute each activity. Some activities may need more than one minor frame interval (the Frame Scheduler allows for this).
Schedule the activities: If all are executed sequentially, will they complete in one major frame? If not, choose activities that can execute concurrently on two or more CPUs, and estimate again. You may have to change the design in order to get greater concurrency.
When the design is complete, implement each activity as an independent process that communicates with the others using shared memory, semaphores, and locks (see “Interprocess Communication”).
When the real-time activities can be handled in a single CPU, the master process that initiates the program contains these steps:
Open, create, and initialize all the shared files and memory resources.
Initiate a Frame Scheduler (a single library call).
Initiate each activity as a process using sproc() or fork().
Each process initializes itself and then waits at a barrier.
Enqueue each activity process to the Frame Scheduler that will dispatch it (another library call).
The master process specifies the process ID and the minor frame or frames in which the process should run, and a scheduling discipline.
Join the barrier where the activity processes are waiting.
When all processes are ready to proceed, all are released.
Start the Frame Scheduler going (a library call).
Wait for a signal indicating it is time to shut down.
Terminate the Frame Schedulers.
A Frame Scheduler seizes its assigned CPU, isolates it, and takes over process scheduling on it. It waits for all enqueued processes to initialize themselves and to execute a library call to ”join” the scheduler. Then it begins dispatching the processes in the specified sequence during each frame interval. It monitors errors, such as a process that fails to complete its work within its frame, and takes a specified action when an error occurs. Typically the error action is to send a signal to the master process. The master process can interrogate the Frame Scheduler, and stop it or restart it.
In a program organized as multiple, cooperating processes, the processes need to share data and coordinate their actions in well-defined ways. IRIX with REACT provides the following mechanisms, which are surveyed in the topics that follow:
Shared memory allows a single segment of memory to appear in the address spaces of multiple processes. The Silicon Graphics implementation is also the basis for implementing interprocess semaphores, locks, and barriers.
Semaphores are used to coordinate access from multiple processes to resources that they share.
Locks provide a low-overhead, high-speed method of mutual exclusion.
Barriers make it easy for multiple processes to synchronize the start of a common activity.
Signals provide asynchronous notification of special events or errors. IRIX supports signal semantics from all major UNIX heritages, but POSIX-standard signals are recommended for real-time programs.
IRIX allows you to map a segment of memory into the address spaces of two or more processes at once. The block of shared memory can be read concurrently, and possibly written, by all the processes that share it. IRIX supports the POSIX and the SVR4 models of shared memory, as well as a system of shared arenas unique to IRIX. These facilities are covered in detail in the manual Topics in IRIX Programming (see “Other Useful Books”).
A semaphore is a memory object that represents the state of a shared resource. The content of a semaphore is an integer count, representing the number of resource units now available. Typically the count is 1, and the semaphore represents the availability of a single object such as a table or file.
A process that needs to use the resource executes a “P” operation on the semaphore. This operation tests and decrements the count in the semaphore. If the count is greater than zero before the operation, at least one resource unit is available. The count is reduced by 1 and the process continues executing. When the count is not greater than zero, the process is blocked until a resource unit is available; then it continues. In either case, following a P operation, the process knows that it has exclusive use of a resource unit.
When it finishes its work, the process releases the resource by executing a “V” operation on the semaphore. This operation increments the count. It also unblocks any process that might be blocked in a P operation, waiting for the resource. If more than one process is waiting, the one that has waited longest is released first (FIFO order).
|Tip: Useful mnemonics for P and V: P depletes the resource. V revives it.|
IRIX supports three forms of semaphore: POSIX-compliant, SVR4-compatible, and Silicon Graphics. All three forms are discussed in the manual Topics in IRIX Programming (see “Other Useful Books”)
A lock is a memory object that represents the exclusive right to use a shared resource. A process that wants to use the resource sets the lock. The process releases the lock when it is finished with the resource.
A lock is functionally the same as a semaphore with a count of 1. The set-lock operation on a lock and the P operation on a semaphore with a count of 1 both acquire exclusive use of a resource. In a multiprocessor, the important difference between a lock and semaphore is that, when the resource is not immediately available, a semaphore always suspends the process, while a lock does not.
A lock, in a multiprocessor system, is set by “spinning.” The program enters a tight loop using the test-and-set machine instruction to test the lock's value and to set it as soon as the lock is clear. In practice the lock is often already available, and the first execution of test-and-set acquires the lock. In this case, setting the lock takes a trivial amount of time.
When the lock is already set, the process spins on the test a certain number of times. If the process that holds the lock is executing concurrently in another CPU, and if it releases the lock during this time, the spinning process acquires the lock instantly. There is zero latency between release and acquisition, and no overhead from entering the kernel for a system call.
For more information on locks, refer to the manual Topics in IRIX Programming (see “Other Useful Books”), and to the usnewlock(3), ussetlock(3) and usunsetlock(3) reference pages.
IRIX supports library functions that perform atomic (uninterruptable) sample-and-set operations on words of memory. For example, test_and_set() copies the value of a word and stores a new value into the word in a single operation; while test_then_add() samples a word and then replaces it with the sum of the sampled value and a new value.
These primitive operations can be used as the basis of mutual-exclusion protocols using words of shared memory. For details, see the test_and_set(3p) reference page.
The test_and_set() and related functions are based on the MIPS R4000 instructions Load Linked and Store Conditional. Load Linked retrieves a word from memory and tags the processor data cache “line” from which it comes. The following Store Conditional tests the cache line. If any other processor or device has modified that cache line since the Load Linked was executed, the store is not done. The implementation of test_then_add() is comparable to the following assembly-language loop:
1: ll retreg, offset(targreg) add tmpreg, retreg, valreg sc tmpreg, offset(targreg) beq tmpreg, 0, b1
The loop continues trying to load, augment, and store the target word until it succeeds. Then it returns the value retrieved. For more details on the R4000 machine language, see one of the books listed in “Other Useful Books”.
The Load Linked and Store Conditional instructions only operate on memory locations that can be cached. Uncached pages (for example, pages implemented as reflective shared memory, see “Reflective Shared Memory”) cannot be set by the test_and_set() functions.
A signal is an urgent notification of an event, sent asynchronously to a process. Some signals originate from the kernel: for example, the SIGFPE signal that notifies of an arithmetic overflow; or SIGALRM that notifies of the expiration of a timer interval (for the complete list, see the signal(5) reference page). The Frame Scheduler issues signals to notify your program of errors or termination. Other signals can originate within your own program.
The time that elapses from the moment a signal is generated until your signal handler begins to execute is the signal latency. Signal latency can be long (as real-time programs measure time) and signal latency has a high variability. (Some of the factors are discussed under “Signal Delivery and Latency”.) In general, you should use signals to deliver infrequent messages of high priority. You should not use the exchange of signals as the basis for scheduling in a real-time program.
|Note: Signals are delivered at particular times when using the Frame Scheduler. See “Using Signals Under the Frame Scheduler”.|
In order to receive a signal, a process must establish a signal handler, a function that will be entered when the signal arrives.
There are three UNIX traditions for signals, and IRIX supports all three. They differ in the library calls used, in the range of signals allowed, and in the details of signal delivery (see Table 2-1). Your real-time program should use the POSIX interface for signals.
BSD 4.2 Calls
set and query signal handler
send a signal
temporarily block specified signals
query pending signals
wait for a signal
Same as BSD
Reserved by IRIX kernel
Reserved by the POSIX standard for system use
Reserved by POSIX for real-time programming
Signals with smaller numbers have priority for delivery. The low-numbered BSD-compatible signals, which include all kernel-produced signals, are delivered ahead of real-time signals; and signal 49 takes precedence over signal 64. (The BSD-compatible interface supports only signals 1-31. This set includes two user-defined signals.)
IRIX supports POSIX signal handling as specified in IEEE 1003.1b-1993. This includes FIFO queueing new signals when a signal type is held, up to a system maximum of queued signals. (The maximum can be adjusted using systune; see the systune(1) reference page.)
For more information on the POSIX interface to signal handling, refer to Topics in IRIX Programming and to the signal(5), sigaction(2), and sigqueue(2) reference pages.
A real-time program sometimes needs a source of timer interrupts, and some need a way to create a high-precision timestamp. Both of these are provided by IRIX. IRIX supports the POSIX clock and timer facilities as specified in IEEE 1003.1b-1993, as well as the BSD itimer facility. The timer facilities are covered in Topics in IRIX Programming (see “Other Useful Books”).
The hardware cycle counter is a high-precision hardware counter that is updated continuously. The precision of the cycle counter depends on the machine in use, but in most systems it is a 64-bit counter.
You sample the cycle counter by calling the POSIX function clock_gettime() specifying the CLOCK_SGI_CYCLE clock type.
The frequency with which the cycle counter is incremented also depends on the hardware system. You can learn the resolution of the clock by calling the POSIX function clock_getres().
|Note: The cycle counter is synchronyzed only to the CPU crystal and is not intended as a perfect time standard. If you use it to measure intervals between events, be aware that it can drift by as much as 100 microseconds per second, depending on the hardware system in use.|
Standard network interfaces let you send packets or streams of data over a local network or the Internet.
Reflective shared memory (provided by third-party manufacturers) lets you share segments of memory between computers, so that programs running on different chassis can access the same variables.
External interrupts let one Challenge/Onyx signal another.
One standard, portable way to connect processes in different computers is to use the BSD-compatible socket I/O interface. You can use sockets to communicate within the same machine, between machines on a local area network, or between machines on different continents.
For more information about socket programming, refer to one of the networking books listed in “Other Useful Books”.
The Message-Passing Interface (MPI) is a standard architecture and programming interface for designing distributed applications. Silicon Graphics, Inc. supports MPI in the POWERChallenge Array product. For details on MPI in Silicon Graphics systems, see the World-Wide Web page http://www.sgi.com/Products/PowerChallengeArray/TechInfo/MPI/. For the MPI standard, see http://www.mcs.anl.gov/mpi/index.html.
The performance of both sockets and MPI depends on the speed of the underlying network. The network that connects nodes (systems) in an Array product has a very high bandwidth.
Reflective shared memory consists of hardware that makes a segment of memory appear to be accessible from two or more computer chassis. Actually the Challenge/Onyx implementation consists of VME bus devices in each computer, connected by a very high-speed, point-to-point network.
The VME bus address space of the memory card is mapped into process address space. Firmware on the card handles communication across the network, so as to keep the memory contents of all connected cards consistent. Reflective shared memory is slower than real main memory but faster than socket I/O. Its performance is essentially that of programmed I/O to the VME bus, which is discussed under “PIO Access”.
Reflective shared memory systems are available for Silicon Graphics equipment from several third-party vendors. The details of the software interface differ with each vendor. However, in most cases you use mmap() to map the shared segment into your process's address space (see Chapter 4, “Managing Virtual Memory in a Real–Time Program” as well as the usrvme(7) reference page).
The Origin200, Origin2000, and Challenge/Onyx systems support external interrupt lines for both incoming and outgoing external interrupts. Software support for these lines is described in the IRIX Device Driver Programmer's Guide (see “Other Useful Books”) and the ei(7) reference page. You can use the external interrupt as the time base for the Frame Scheduler. In that case, the Frame Scheduler manages the external interrupts for you. (See “Selecting a Time Base”.)