Chapter 12. Troubleshooting

Chapter 12. Troubleshooting
Prev		Next

This chapter discusses the following:

Diagnostic Tools

You can use the following diagnostic tools:

Use the cat(1) command to view the /proc/interrupts file in order to determine where your interrupts are going:
[user@linux user]% cat /proc/interrupts
For an example, see Appendix A, “libreact API Example”.
Use the profile.pl(1) Perl script to do procedure-level profiling of a program and discover latencies. For more information, see the profile.pl(1) man page.
Use the following ps(1) command to see where your threads are running:
[user@linux user]% ps -FC processname
For an example, see Appendix A, “libreact API Example”.

To see the scheduling policy, real-time priority, and current processor of all threads on the system, use the following command:
[user@linux user]% ps -eLo pid,tid,class,rtprio,psr,cmd
For more information, see the ps(1) man page.
Use the top(1) command to display the largest processes on the system. For more information, see the top(1) man page.

Use the strace(1) command to determine where an application is spending most of its time and where there may be large latencies. The strace command is a very flexible tool for tracing application activities and can be used for tracking down latencies in an application. Following are several simple examples:

To see the amount of time being used by system calls in the form of histogram data for a program named hello_world, use the following:

[root@linux root]# strace -c hello_world
execve("./hello_world", ["hello_world"], [/* 80 vars */]) = 0
Hello World
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 27.69    0.000139          28         5         3 open
 20.92    0.000105          15         7           mmap
 10.76    0.000054          54         1           write
  7.57    0.000038          13         3           fstat
  6.57    0.000033          17         2         1 stat
  5.98    0.000030          15         2           munmap
  4.58    0.000023          12         2           close
  4.38    0.000022          22         1           mprotect
  4.18    0.000021          21         1           madvise
  2.99    0.000015          15         1           read
  2.39    0.000012          12         1           brk
  1.99    0.000010          10         1           uname
------ ----------- ----------- --------- --------- ----------------
100.00    0.000502                    27         4 total

You can record the actual chronological progression through a program with the following command (line breaks added for readability):

[root@linux root]# strace -ttT hello_world
14:21:03.974181 execve("./hello_world", ["hello_world"], [/* 80 vars */]) = 0
..
14:21:03.976992 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
   = 0x2000000000040000 <0.000007>
14:21:03.977053 write(1, "Hello World\n", 12Hello World
) = 12 <0.000008>
14:21:03.977109 munmap(0x2000000000040000, 65536) = 0 <0.000009>
14:21:03.977158 exit_group(0)           = ?

The time stamps are displayed in the following format:

hour:minute:second.microsecond

The execution time of each system call is displayed in the following format:

<second>

Note: You can use the -p option to attach to another already running process.

For more information, see the strace(1) man page.

Use Linux Trace Toolkit Next Generation (LTTng) commands. See Chapter 11, “SLES LTTng”.

To find the CPU-to-core numbering scheme, examine the following fields in the /proc/cpuinfo file:

processor

physical id

core id

For example, the following output for a third-party x86-64 system shows that logical CPU 0 (processor 0) and CPU 2 (processor 2) are cores sharing the same socket: (physical id 0)

processor       : 0
...
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2


processor       : 2
...
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2

The following output shows two logical processors CPU 0 (processor 0) and CPU 8 (processor 8):

processor       : 0
..
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8

processor       : 8
..
physical id     : 1
siblings        : 16
core id         : 0
cpu cores       : 8

Note the following:

CPU 0 is housed in the first socket on the system (physical id 0). This socket has 8 CPU cores. Each of those cores will have two logical CPUs if hyperthreading is enabled.
CPU 8 is housed in the second socket (physical id 1). This socket has 8 CPU cores. Each of those cores will have two logical CPUs if hyperthreading is enabled.

Each logical CPU is in the first core on its respective socket (core ID 0).

Problem Removing `/rtcpus`

You should stop real-time processes before using the --disable option. However, the script will attempt to remove the process from the real-time CPUs and display the following failure message if it was unable to move them:

 "*** Problem removing /rtcpus/rtcpu3. cpuset***
  Try again.  If that doesn't work check /dev/cpuset/rtcpus/rtcpu3/tasks
  for potential problem PIDS;

Prev	Table of Contents	Next
Chapter 11. SLES LTTng		Appendix A. libreact API Example