Chapter 3. Monitoring Commands

This chapter includes the following topics:

About the Operating System Monitoring Commands

You can use operating system commands to understand the usage and limits of your system. These commands allow you to observe both overall system performance and single-performance execution characteristics.

The topics in this chapter describe the commands that are included in your SGI system's operating system. The following are additional commands and utilities that are available:

  • SGI Foundation Software utilities. SFS is included by default on your SGI system. For information about these utilities, see the following:

    SGI Foundation Software (SFS) User Guide

  • Performance Co-Pilot. For documentation about this open source toolset, see the following website:

    http://www.pcp.io/documentation.html

Operating System Monitoring Commands

The following topics show several operating system commands you can use to determine user load, system usage, and active processes:

Using the w(1) command

To obtain a high-level view of system usage that includes information about who is logged into the system, use the w(1) command, as follows:

uv44-sys:~ # w
 15:47:48 up  2:49,  5 users,  load average: 0.04, 0.27, 0.42
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0     13:10    1:41m  0.07s  0.07s -bash
root     pts/2     13:31    0.00s  0.14s  0.02s w
boetcher pts/4     14:30    2:13   0.73s  0.73s -csh
root     pts/5     14:32    1:14m  0.04s  0.04s -bash
root     pts/6     15:09   31:25   0.08s  0.08s -bash

The w command's output shows who is on the system, the duration of user sessions, processor usage by user, and currently executing user commands. The output consists of two parts:

  • The first output line shows the current time, the length of time the system has been up, the number of users on the system, and the average number of jobs in the run queue in the last one, five, and 15 minutes.

  • The rest of the output from the w command shows who is logged into the system, the duration of each user session, processor usage by user, and each user's current process command line.

Using the ps(1) Command

To determine active processes, use the ps(1) command, which displays a snapshot of the process table.

The ps -A r command example that follows returns all the processes currently running on a system:

[user@profit user]# ps -A r
   PID TTY      STAT   TIME COMMAND
211116 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211117 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211118 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211119 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211120 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211121 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211122 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211123 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211124 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211125 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211126 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211127 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211128 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211129 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211130 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
211131 pts/0    R+     4:08 /usr/diags/bin/olconft RUNTIME=5
...

Using the top(1) Command

To monitor running processes, use the top (1) command. This command displays a sorted list of top CPU utilization processes.

Using the vmstat(8) Command

The vmstat(8) command reports virtual memory statistics. It reports information about processes, memory, paging, block IO, traps, and CPU activity. For more information, see the vmstat(8) man page.

In the following vmstat(8) command, the 10 specifies a 10-second delay between updates.

uv44-sys:~ # vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 235984032 418748 8649568    0    0     0     0    0    0  0  0 100  0  0
 1  0      0 236054400 418748 8645216    0    0     0  4809 256729 3401  0  0 100  0  0
 1  0      0 236188016 418748 8649904    0    0     0   448 256200  631  0  0 100  0  0
 2  0      0 236202976 418748 8645104    0    0     0   341 256201 1117  0  0 100  0  0
 1  0      0 236088720 418748 8592616    0    0     0   847 257104 6152  0  0 100  0  0
 1  0      0 235990944 418748 8648460    0    0     0   240 257085 5960  0  0 100  0  0
 1  0      0 236049568 418748 8645100    0    0     0  4849 256749 3604  0  0 100  0  0

Without the delay parameter, which is 10 in this example, the output returns averages since the last reboot. Additional reports give information on a sampling period of length delay. The process and memory reports are instantaneous in either case.

Using the iostat(1) command

The iostat(1) command monitors system input/output device loading by observing the time the devices are active, relative to their average transfer rates. You can use information from the iostat command to change system configuration information to better balance the input/output load between physical disks. For more information, see the iostat(1) man page.

In the following iostat(1) command, the 10 specifies a 10-second interval between updates:

# iostat 10
Linux 2.6.32-430.el6.x86_64 (harp34-sys)      02/21/2014    _x86_64_  (256 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          46.24    0.01    0.67    0.01    0.00   53.08

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              53.66     23711.65     23791.93 21795308343 21869098736
sdb               0.01         0.02         0.00      17795          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          99.96    0.00    0.04    0.00    0.00    0.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda             321.20    149312.00    150423.20    1493120    1504232
sdb               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          99.95    0.00    0.05    0.00    0.00    0.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda             305.19    146746.05    148453.95    1468928    1486024
sdb               0.00         0.00         0.00          0          0
...

Using the sar(1) command

The sar(1) command returns the content of selected, cumulative activity counters in the operating system. Based on the values in the count and interval parameters, the command writes information count times spaced at the specified interval, which is in seconds. For more information, see the sar(1) man page. The following example shows the sar(1) command with a request for information about CPU 1, a count of 10, and an interval of 10:

uv44-sys:~ # sar -P 1 10 10
Linux 2.6.32-416.el6.x86_64 (harp34-sys)      09/19/2013   _x86_64_   (256 CPU)

11:24:54 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
11:25:04 AM       1      0.20      0.00      0.10      0.00      0.00     99.70
11:25:14 AM       1     10.10      0.00      0.30      0.00      0.00     89.60
11:25:24 AM       1     99.70      0.00      0.30      0.00      0.00      0.00
11:25:34 AM       1     99.70      0.00      0.30      0.00      0.00      0.00
11:25:44 AM       1      8.99      0.00      0.60      0.00      0.00     90.41
11:25:54 AM       1      0.10      0.00      0.20      0.00      0.00     99.70
11:26:04 AM       1     38.70      0.00      0.10      0.00      0.00     61.20
11:26:14 AM       1     99.80      0.00      0.10      0.00      0.00      0.10
11:26:24 AM       1     80.42      0.00      0.70      0.00      0.00     18.88
11:26:34 AM       1      0.10      0.00      0.20      0.00      0.00     99.70
Average:          1     43.78      0.00      0.29      0.00      0.00     55.93