2.2. Linux Performance Tools: CPU
Here begins our discussion of performance tools that enable you to extract information previously described.
2.2.1. vmstat (Virtual Memory Statistics)
vmstat stands for virtual memory statistics, which indicates that it will give you information about the virtual memory system performance of your system. Fortunately, it actually does much more than that. vmstat is a great command to get a rough idea of how your system performs as a whole. It tells youHow many processes are runningHow the CPU is being usedHow many interrupts the CPU receivesHow many context switches the scheduler performs
It is an excellent tool to use to get a rough idea of how the system performs.
2.2.1.1 CPU Performance-Related Options
vmstat can be invoked with the following command line:
vmstat can be run in two modes: sample mode and average mode. If no parameters are specified, vmstat stat runs in average mode, where vmstat displays the average value for all the statistics since system boot. However, if a delay is specified, the first sample will be the average since system boot, but after that vmstat samples the system every delay seconds and prints out the statistics. Table 2-1 describes the options that vmstat accepts.
vmstat [-n] [-s] [delay [count]]
2.2.1.2 Example Usage
As shown in Listing 2.2, if vmstat is run with no command-line parameters, it displays the average values for the statistics that it records since the system booted. This example shows that the system was nearly idle since boot, as indicated by the CPU usage columns, under us, sys, wa, and id. The CPU spent 5 percent of the time since boot on user application code, 1 percent on system code, and the rest, 94 percent sitting idle.
Listing 2.2.
Although vmstat's statistics since system boot can be useful to determine how heavily loaded the machine has been, vmstat is most useful when it runs in sampling mode, as shown in Listing 2.3. In sampling mode, vmstat prints the systems statistics after the number of seconds passed with the delay parameter. It does this sampling count a number of times. The first line of statistics in Listing 2.3 contains the system averages since boot, as before, but then the periodic sample continues after that. This example shows that there is very little activity on the system. We can see that no processes were blocked during the run by looking at the 0 in the b. We can also see, by looking in the r column, that fewer than 1 processes were running when vmstat sampled its data.
[ezolt@scrffy tmp]$ vmstat
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
1 0 181024 26284 35292 503048 0 0 3 2 6 1 5 1 94 0
Listing 2.3.
vmstat is an excellent way to record how a system behaves under a load or during a test condition. You can use vmstat to display how the system is behaving and, at the same time, save the result to a file by using the Linux tee command. (Chapter 8, "Utility Tools: Performance Tool Helpers," describes the tee command in more detail.) If you only pass in the delay parameter, vmstat will sample indefinitely. Just start it before the test, and interrupt it after the test has completed. The output file can be imported into a spreadsheet, and used to see how the system reacts to the load or various system events. Listing 2.4 shows the output of this technique. In this example, we can look at the interrupt and context switches that the system is generating. We can see the total number of interrupts and context switches in the in and cs columns respectively.The number of context switches looks good compared to the number of interrupts. The scheduler is switching processes less than the number of timer interrupts that are firing. This is most likely because the system is nearly idle, and most of the time when the timer interrupt fires, the scheduler does not have any work to do, so it does not switch from the idle process.(Note: There is a bug in the version of vmstat that generated the following output. It causes the system average line of output to display incorrect values. This bug has been reported to the maintainer of vmstat and will be fixed soon, hopefully.)
[ezolt@scrffy tmp]$ vmstat 2 5
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 181024 26276 35316 502960 0 0 3 2 6 1 5 1 94 0
1 0 181024 26084 35316 502960 0 0 0 0 1318 772 1 0 98 0
0 0 181024 26148 35316 502960 0 0 0 24 1314 734 1 0 98 0
0 0 181024 26020 35316 502960 0 0 0 0 1315 764 2 0 98 0
0 0 181024 25956 35316 502960 0 0 0 0 1310 764 2 0 98 0
Listing 2.4.
More recent versions of vmstat can even extract more detailed information about a grab bag of different system statistics, as shown in Listing 2.5.The next chapter discusses the memory statistics, but we look at the CPU statistics now. The first group of statistics, or "CPU ticks," shows how the CPU has spent its time since system boot, where a "tick" is a unit of time. Although the condensed vmstat output only showed four CPU statesus, sy, id, and wathis shows how all the CPU ticks are distributed. In addition, we can see the total number of interrupts and context switches. One new addition is that of forks, which is basically the number of new processes that have been created since system boot.
[ezolt@scrffy ~/edid]$ vmstat 1 | tee /tmp/output
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 201060 35832 26532 324112 0 0 3 2 6 2 5 1 94 0
0 0 201060 35888 26532 324112 0 0 16 0 1138 358 0 0 99 0
0 0 201060 35888 26540 324104 0 0 0 88 1163 371 0 0 100 0
0 0 201060 35888 26540 324104 0 0 0 0 1133 345 0 0 100 0
0 0 201060 35888 26540 324104 0 0 0 60 1174 351 0 0 100 0
0 0 201060 35920 26540 324104 0 0 0 0 1150 408 0 0 100 0
[Ctrl-C]
Listing 2.5.
vmstat provides a broad range of information about the performance of a Linux system. It is one of the core tools to use when investigating a problem with a system.
[ezolt@scrffy ~/edid]$ vmstat -s
1034320 total memory
998712 used memory
698076 active memory
176260 inactive memory
35608 free memory
26592 buffer memory
324312 swap cache
2040244 total swap
201060 used swap
1839184 free swap
5279633 non-nice user cpu ticks
28207739 nice user cpu ticks
2355391 system cpu ticks
628297350 idle cpu ticks
862755 IO-wait cpu ticks
34 IRQ cpu ticks
1707439 softirq cpu ticks
21194571 pages paged in
12677400 pages paged out
93406 pages swapped in
181587 pages swapped out
1931462143 interrupts
785963213 CPU context switches
1096643656 boot time
578451 forks
2.2.2. top (v. 2.0.x)
top is the Swiss army knife of Linux system-monitoring tools. It does a good job of putting a very large amount of system-wide performance information in a single screen. What you display can also be changed interactively; so if a particular problem creeps up as your system runs, you can modify what top is showing you.By default, top presents a list, in decreasing order, of the top CPU-consuming processes. This enables you to quickly pinpoint which program is hogging the CPU. Periodically, top updates the list based on a delay that you can specify. (It is initially 3 seconds.)
2.2.2.1 CPU Performance-Related Options
top is invoked with the following command line:
top actually takes options in two modes: command-line options and runtime options. The command-line options determine how top displays its information. Table 2-3 shows the command-line options that influence the type and frequency of the performance statistics that top displays.
top [d delay] [C] [H] [i] [n iter] [b]
| f or F | This displays a configuration screen that enables you to select which process statistics display on the screen. |
| o or O | This displays a configuration screen that enables you to change the order of the displayed statistics. |
| l | This toggles whether the load average and uptime information will be updated and displayed. |
| t | This toggles the display of how each CPU spends its time. It also toggles information about how many processes are currently running. Shows all the individual threads of an application instead of just displaying a total for each application. |
| m | This toggles whether information about the system memory usage will be shown on the screen. By default, the highest CPU consumers are displayed first. However, it might be more useful to sort by other characteristics. |
| us | CPU time spent in user applications. |
| sy | CPU time spent in the kernel. |
| ni | CPU time spent in "nice"ed processes. |
| id | CPU time spent idle. |
| wa | CPU time spent waiting for I/O. |
| hi | CPU time spent in the irq handlers. |
| si | CPU time spent in the softirq handlers. |
| load average | The 1-minute, 5-minute, and 15-minute load average. |
| %CPU | The percentage of CPU that a particular process is consuming. |
| PRI | The priority value of the process, where a higher value indicates a higher priority. RT indicates that the task has real-time priority, a priority higher than the standard range. |
| NI | The nice value of the process. The higher the nice value, the less the system has to execute the process. Processes with high nice values tend to have very low priorities. |
| WCHAN | If a process is waiting on an I/O, this shows which kernel function it is waiting in. |
| STAT | This is the current status of a process, where the process is either sleeping (S), running (R), zombied (killed but not yet dead) (Z), in an uninterruptable sleep (D), or being traced (T). |
| TIME | The total amount CPU time (user and system) that this process has used since it started executing. |
| COMMAND | That command that this process is executing. |
| LC | The number of the last CPU that this process was executing on. |
| FLAGS | This toggles whether the load average and uptime information will be updated and displayed. |
2.2.2.2 Example Usage
Listing 2.6 is an example run of top. Once it starts, it periodically updates the screen until you exit it. This demonstrates some of the system-wide statistics that top can generate. First, we see the load average of the system over the past 1, 5, and 15 minutes. As we can see, the system has started to get busy recently (because doom-3.x86). One CPU is busy with user code 90 percent of the time. The other is only spending ~13 percent of its time in user code. Finally, we can see that 73 of the processes are sleeping, and only 3 of them are currently running.
Listing 2.6.
Now pressing F while top is running brings the configuration screen shown in Listing 2.7. When you press the keys indicated (A for PID, B for PPID, etc.), top toggles whether these statistics display in the previous screen. When all the desired statistics are selected, press Enter to return to top's initial screen, which now shows the current values of selected statistics. When configuring the statistics, all currently activated fields are capitalized in the Current Field Order line and have an asterisk (*) next to their name.
catan> top
08:09:16 up 2 days, 18:44, 4 users, load average: 0.95, 0.44, 0.17
76 processes: 73 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 51.5% 0.0% 3.9% 0.0% 0.0% 0.0% 44.6%
cpu00 90.0% 0.0% 1.2% 0.0% 0.0% 0.0% 8.8%
cpu01 13.0% 0.0% 6.6% 0.0% 0.0% 0.0% 80.4%
Mem: 2037140k av, 1132120k used, 905020k free, 0k shrd, 86220k buff
689784k active, 151528k inactive
Swap: 2040244k av, 0k used, 2040244k free 322648k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
7642 root 25 0 647M 379M 7664 R 49.9 19.0 2:58 0 doom.x86
7661 ezolt 15 0 1372 1372 1052 R 0.1 0.0 0:00 1 top
1 root 15 0 528 528 452 S 0.0 0.0 0:05 1 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
4 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
9 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
7 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kswapd
8 root 15 0 0 0 0 SW 0.0 0.0 0:00 1 kscand
10 root 15 0 0 0 0 SW 0.0 0.0 0:00 1 kupdated
11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
Listing 2.7.
To show you how customizable top is, Listing 2.8 shows a highly configured output screen, which shows only the top options relevant to CPU usage.
[ezolt@wintermute doc]$ top
(press 'F' while running)
Current Field Order: AbcDgHIjklMnoTP|qrsuzyV{EFW[X
Toggle fields with a-z, any other key to return:
* A: PID = Process Id
B: PPID = Parent Process Id
C: UID = User Id
* D: USER = User Name
* E: %CPU = CPU Usage
* F: %MEM = Memory Usage
G: TTY = Controlling tty
* H: PRI = Priority
* I: NI = Nice Value
J: PAGEIN = Page Fault Count
K: TSIZE = Code Size (kb)
L: DSIZE = Data+Stack Size (kb)
* M: SIZE = Virtual Image Size (kb)
N: TRS = Resident Text Size (kb)
O: SWAP = Swapped kb
* P: SHARE = Shared Pages (kb)
Q: A = Accessed Page count
R: WP = Write Protected Pages
S: D = Dirty Pages
* T: RSS = Resident Set Size (kb)
U: WCHAN = Sleeping in Function
* V: STAT = Process Status
* W: TIME = CPU Time
* X: COMMAND = Command
Y: LC = Last used CPU (expect this to change regularly)
Z: FLAGS = Task Flags (see linux/sched.h)
Listing 2.8.
top provides an overview of system resource usage with a focus on providing information about how various processes are consuming those resources. It is best used when interacting with the system directly because of the user-friendly and tool-unfriendly format of its output.
08:16:23 up 2 days, 18:52, 4 users, load average: 1.07, 0.92, 0.49
76 processes: 73 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 48.2% 0.0% 1.5% 0.0% 0.0% 0.0% 50.1%
cpu00 0.3% 0.0% 0.1% 0.0% 0.0% 0.0% 99.5%
cpu01 96.2% 0.0% 2.9% 0.0% 0.0% 0.0% 0.7%
Mem: 2037140k av, 1133548k used, 903592k free, 0k shrd, 86232k buff
690812k active, 151536k inactive
Swap: 2040244k av, 0k used, 2040244k free 322656k cached
PID USER PRI NI WCHAN FLAGS LC STAT %CPU TIME CPU COMMAND
7642 root 25 0 100100 1 R 49.6 10:30 1 doom.x86
1 root 15 0 400100 0 S 0.0 0:05 0 init
2 root RT 0 140 0 SW 0.0 0:00 0 migration/0
3 root RT 0 140 1 SW 0.0 0:00 1 migration/1
4 root 15 0 40 0 SW 0.0 0:00 0 keventd
5 root 34 19 40 0 SWN 0.0 0:00 0 ksoftirqd/0
6 root 34 19 40 1 SWN 0.0 0:00 1 ksoftirqd/1
9 root 25 0 40 0 SW 0.0 0:00 0 bdflush
7 root 15 0 840 0 SW 0.0 0:00 0 kswapd
8 root 15 0 40 0 SW 0.0 0:00 0 kscand
10 root 15 0 40 0 SW 0.0 0:00 0 kupdated
11 root 25 0 40 0 SW 0.0 0:00 0 mdrecoveryd
20 root 15 0 400040 0 SW 0.0 0:00 0 katad-1
2.2.3. top (v. 3.x.x)
Recently, the version of top provided by the most recent distributions has been completely overhauled, and as a result, many of the command-line and interaction options have changed. Although the basic ideas are similar, it has been streamlined, and a few different display modes have been added.Again, top presents a list, in decreasing order, of the top CPU-consuming processes.
2.2.3.1 CPU Performance-Related Options
top is invoked with the following command line:
top actually takes options in two modes: command-line options and runtime options. The command-line options determine how top displays its information. Table 2-8 shows the command-line options that influence the type and frequency of the performance statistics that top will display.
top [-d delay] [-n iter] [-i] [-b]
| us | CPU time spent in user applications. |
| sy | CPU time spent in the kernel. |
| ni | CPU time spent in "nice"ed processes. |
| id | CPU time spent idle. |
| wa | CPU time spent waiting for I/O. |
| hi | CPU time spent in the irq handlers. |
| si | CPU time spent in the softirq handlers. |
| load average | The 1-minute, 5-minute, and 15-minute load average. |
| %CPU | The percentage of CPU that a particular process is consuming. |
| PRI | The priority value of the process, where a higher value indicates a higher priority. RT indicates that the task has real-time priority, a priority higher than the standard range. |
| NI | The nice value of the process. The higher the nice value, the less the system has to execute the process. Processes with high nice values tend to have very low priorities. |
| WCHAN | If a process is waiting on an I/O, this shows which kernel function it is waiting in. |
| TIME | The total amount CPU time (user and system) that this process has used since it started executing. |
| COMMAND | That command that this process is executing. |
| S | This is the current status of a process, where the process is either sleeping (S), running (R), zombied (killed but not yet dead) (Z), in an uninterruptable sleep (D), or being traced (T). |
2.2.3.2 Example Usage
Listing 2.9 is an example run of top v3.0. Again, it will periodically update the screen until you exit it. The statistics are similar to those of top v2.x, but are named slightly differently.
Listing 2.9.
Now pressing f while top is running brings the configuration screen shown in Listing 2.10. When you press the keys indicated (A for PID, B for PPID, etc.), top toggles whether these statistics display in the previous screen. When all the desired statistics are selected, press Enter to return to top's initial screen, which now shows the current values of selected statistics. When you are configuring the statistics, all currently activated fields are capitalized in the Current Field Order line and have and asterisk (*) next to their name. Notice that most of statistics are similar, but the names have slightly changed.
catan> top
top - 08:52:21 up 19 days, 21:38, 17 users, load average: 1.06, 1.13, 1.15
Tasks: 149 total, 1 running, 146 sleeping, 1 stopped, 1 zombie
Cpu(s): 0.8% us, 0.4% sy, 4.2% ni, 94.2% id, 0.1% wa, 0.0% hi, 0.3% si
Mem: 1034320k total, 1023188k used, 11132k free, 39920k buffers
Swap: 2040244k total, 214496k used, 1825748k free, 335488k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26364 root 16 0 400m 68m 321m S 3.8 6.8 379:32.04 X
26737 ezolt 15 0 71288 45m 21m S 1.9 4.5 6:32.04 gnome-terminal
29114 ezolt 15 0 34000 22m 18m S 1.9 2.2 27:57.62 gnome-system-mo
9581 ezolt 15 0 2808 1028 1784 R 1.9 0.1 0:00.03 top
1 root 16 0 2396 448 1316 S 0.0 0.0 0:01.68 init
2 root RT 0 0 0 0 S 0.0 0.0 0:00.68 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0
4 root RT 0 0 0 0 S 0.0 0.0 0:00.27 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/1
6 root RT 0 0 0 0 S 0.0 0.0 0:22.49 migration/2
7 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/2
8 root RT 0 0 0 0 S 0.0 0.0 0:37.53 migration/3
9 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/3
10 root 5 -10 0 0 0 S 0.0 0.0 0:01.74 events/0
11 root 5 -10 0 0 0 S 0.0 0.0 0:02.77 events/1
12 root 5 -10 0 0 0 S 0.0 0.0 0:01.79 events/2
Listing 2.10.
Listing 2.11 shows the new output mode of top, where many different statistics are sorted and displayed on the same screen.
(press 'f' while running)
Current Fields: AEHIOQTWKNMbcdfgjplrsuvyzX for window 1:Def
Toggle fields via field letter, type any other key to return
* A: PID = Process Id u: nFLT = Page Fault count
* E: USER = User Name v: nDRT = Dirty Pages count
* H: PR = Priority y: WCHAN = Sleeping in Function
* I: NI = Nice value z: Flags = Task Flags <sched.h>
* O: VIRT = Virtual Image (kb) * X: COMMAND = Command name/line
* Q: RES = Resident size (kb)
* T: SHR = Shared Mem size (kb) Flags field:
* W: S = Process Status 0x00000001 PF_ALIGNWARN
* K: %CPU = CPU usage 0x00000002 PF_STARTING
* N: %MEM = Memory usage (RES) 0x00000004 PF_EXITING
* M: TIME+ = CPU Time, hundredths 0x00000040 PF_FORKNOEXEC
b: PPID = Parent Process Pid 0x00000100 PF_SUPERPRIV
c: RUSER = Real user name 0x00000200 PF_DUMPCORE
d: UID = User Id 0x00000400 PF_SIGNALED
f: GROUP = Group Name 0x00000800 PF_MEMALLOC
g: TTY = Controlling Tty 0x00002000 PF_FREE_PAGES (2.5)
j: #C = Last used cpu (SMP) 0x00008000 debug flag (2.5)
p: SWAP = Swapped size (kb) 0x00024000 special threads (2.5)
l: TIME = CPU Time 0x001D0000 special states (2.5)
r: CODE = Code size (kb) 0x00100000 PF_USEDFPU (thru 2.4)
s: DATA = Data+Stack size (kb)
Listing 2.11.
top v3.x provides a slightly cleaner interface to top. It simplifies some aspects of it and provides a nice "summary" information screen that displays many of the resource consumers in the system.
(press 'F' while running)
1:Def - 09:00:48 up 19 days, 21:46, 17 users, load average: 1.01, 1.06, 1.10
Tasks: 144 total, 1 running, 141 sleeping, 1 stopped, 1 zombie
Cpu(s): 1.2% us, 0.9% sy, 0.0% ni, 97.9% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 1034320k total, 1024020k used, 10300k free, 39408k buffers
Swap: 2040244k total, 214496k used, 1825748k free, 335764k cached
1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29114 ezolt 16 0 34112 22m 18m S 3.6 2.2 28:15.06 gnome-system-mo
26364 root 15 0 400m 68m 321m S 2.6 6.8 380:01.09 X
9689 ezolt 16 0 3104 1092 1784 R 1.0 0.1 0:00.09 top
2 PID PPID TIME+ %CPU %MEM PR NI S VIRT SWAP RES UID COMMAND
30403 24989 0:00.03 0.0 0.1 15 0 S 5808 4356 1452 9336 bash
29510 29505 7:19.59 0.0 5.9 16 0 S 125m 65m 59m 9336 firefox-bin
29505 29488 0:00.00 0.0 0.1 16 0 S 5652 4576 1076 9336 run-mozilla.sh
3 PID %MEM VIRT SWAP RES CODE DATA SHR nFLT nDRT S PR NI %CPU COMMAND
8414 25.0 374m 121m 252m 496 373m 98m 1547 0 S 16 0 0.0 soffice.bin
26364 6.8 400m 331m 68m 1696 398m 321m 2399 0 S 15 0 2.6 X
29510 5.9 125m 65m 59m 64 125m 31m 253 0 S 16 0 0.0 firefox-bin
26429 4.7 59760 10m 47m 404 57m 12m 1247 0 S 15 0 0.0 metacity
4 PID PPID UID USER RUSER TTY TIME+ %CPU %MEM S COMMAND
1371 1 43 xfs xfs ? 0:00.10 0.0 0.1 S xfs
1313 1 51 smmsp smmsp ? 0:00.08 0.0 0.2 S sendmail
982 1 29 rpcuser rpcuser ? 0:00.07 0.0 0.1 S rpc.statd
963 1 32 rpc rpc ? 0:06.23 0.0 0.1 S portmap
2.2.4. procinfo (Display Info from the /proc File System)
Much like vmstat, procinfo provides a view of the system-wide performance characteristics. Although some of the information that it provides is similar to that of vmstat, it also provides information about the number of interrupts that the CPU received for each device. Its output format is a little more readable than vmstat, but it takes up much more screen space.
2.2.4.1 CPU Performance-Related Options
procinfo is invoked with the following command:
Table 2-12 describes the different options that change the output and the frequency of the samples that procinfo displays.
procinfo [-f] [-d] [-D] [-n sec] [-f file]
| -f | Runs procinfo in full-screen mode |
| -d | Displays statistics change between samples rather than totals |
| -D | Displays statistic totals rather than rate of change |
| -n sec | Number of seconds to pause between each sample |
| -Ffile | Sends the output of procinfo to a file |
2.2.4.2 Example Usage
Calling procinfo without any command options yields output similar to Listing 2.12. Without any options, procinfo displays only one screenful of status and then exits. procinfo is more useful when it is periodically updated using the -n second options. This enables you to see how the system's performance is changing in real time.
Listing 2.12.
As you can see from Listing 2.12, procinfo provides a reasonable overview of the system. We can see that, once again for the user, nice, system, and idle time, the system is not very busy. One interesting thing to notice is that procinfo claims that the system has spent more idle time than the system has been running (as indicated by the uptime). This is because the system actually has four CPUs, so for every day of wall time, four days of CPU time passes. The load average confirms that the system has been relatively work-free for the recent past. For the past minute, on the average, the system had less than one process ready to run; a load average of .47 indicates that a single process was ready to run only 47 percent of the time. On a four-CPU system, this large amount of CPU power is going to waste.procinfo also gives us a good view of what devices on the system are causing interrupts. We can see that the Nvidia card (nvidia), IDE controller (ide0), Ethernet device (eth0), and sound card (es1371) have a relatively high number of interrupts. This is as one would expect for a desktop workstation.procinfo has the advantage of putting many of the system-wide performance statistics within a single screen, enabling you to see how the system is performing as a whole. It lacks details about network and disk performance, but it provide a good system-wide detail of the CPU and memory performance. One limitation that can be significant is the fact that procinfo does not report when the CPU is in the iowait, irq, or softirq mode.
[ezolt@scrffy ~/mail]$ procinfo
Linux 2.4.18-3bigmem (bhcompile@daffy) (gcc 2.96 20000731 ) #1 4CPU [scrffy]
Memory: Total Used Free Shared Buffers Cached
Mem: 1030784 987776 43008 0 35996 517504
Swap: 2040244 17480 2022764
Bootup: Thu Jun 3 09:20:22 2004 Load average: 0.47 0.32 0.26 1/118 10378
user : 3:18:53.99 2.7% page in : 1994292 disk 1: 20r 0w
nice : 0:00:22.91 0.0% page out: 2437543 disk 2: 247231r 131696w
system: 3:45:41.20 3.1% swap in : 996
idle : 4d 15:56:17.10 94.0% swap out: 4374
uptime: 1d 5:45:18.80 context : 64608366
irq 0: 10711880 timer irq 12: 1319185 PS/2 Mouse
irq 1: 94931 keyboard irq 14: 7144432 ide0
irq 2: 0 cascade [4] irq 16: 16 aic7xxx
irq 3: 1 irq 18: 4152504 nvidia
irq 4: 1 irq 19: 0 usb-uhci
irq 6: 2 irq 20: 4772275 es1371
irq 7: 1 irq 22: 384919 aic7xxx
irq 8: 1 rtc irq 23: 3797246 usb-uhci, eth0
2.2.5. gnome-system-monitor
gnome-system-monitor is, in many ways, a graphical counterpart of top. It enables you to graphically monitor individual processes and observe the load on the system based on the graphs that it displays.
2.2.5.1 CPU Performance-Related Options
gnome-system-monitor can be invoked from the Gnome menu. (Under Red Hat 9 and greater, this is under System Tools > System Monitor.) However, it can also be invoked using the following command:
gnome-system-monitor has no relevant command-line options that affect the CPU performance measurements. However, some of the statistics shown can be modified by selecting gnome-system-monitor's Edit > Preferences menu entry.
gnome-system-monitor
2.2.5.2 Example Usage
When you launch gnome-system-monitor, it creates a window similar to Figure 2-1. This window shows information about the amount of CPU and memory that a particular process is using. It also shows information about the parent/child relationships between each process.
Figure 2-1.

Figure 2-2.

2.2.6. mpstat (Multiprocessor Stat)
mpstat is a fairly simple command that shows you how your processors are behaving based on time. The biggest benefit of mpstat is that it shows the time next to the statistics, so you can look for a correlation between CPU usage and time of day.If you have multiple CPUs or hyperthreading-enabled CPUs, mpstat can also break down CPU usage based on the processor, so you can see whether a particular processor is doing more work than the others. You can select which individual processor you want to monitor or you can ask mpstat to monitor all of them.
2.2.6.1 CPU Performance-Related Options
mpstat can be invoked using the following command line:
Once again, delay specifies how often the samples will be taken, and count determines how many times it will be run. Table 2-14 describes the command-line options of mpstat.
mpstat [ -P { cpu | ALL } ] [ delay [ count ] ]
2.2.6.2 Example Usage
First, we ask mpstat to show us the CPU statistics for processor number 0. This is shown in Listing 2.13.
Listing 2.13.
Listing 2.14 shows a similar command on very unloaded CPUs that both have hyperthreading. You can see how the stats for all the CPUs are shown. One interesting observation in this output is the fact that one CPU seems to handle all the interrupts. If the system was heavy loaded with I/O, and all the interrupts were being handed by a single processor, this could be the cause of a bottleneck, because one CPU is overwhelmed, and the rest are waiting for work to do. You would be able to see this with mpstat, if the processor handling all the interrupts had no idle time, whereas the other processors did.
[ezolt@scrffy sysstat-5.1.1]$ ./mpstat -P 0 1 10
Linux 2.6.8-1.521smp (scrffy) 10/20/2004
07:12:02 PM CPU %user %nice %sys %iowait %irq %soft %idle intr/s
07:12:03 PM 0 9.80 0.00 1.96 0.98 0.00 0.00 87.25 1217.65
07:12:04 PM 0 1.01 0.00 0.00 0.00 0.00 0.00 98.99 1112.12
07:12:05 PM 0 0.99 0.00 0.00 0.00 0.00 0.00 99.01 1055.45
07:12:06 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1072.00
07:12:07 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1075.76
07:12:08 PM 0 1.00 0.00 0.00 0.00 0.00 0.00 99.00 1067.00
07:12:09 PM 0 4.90 0.00 3.92 0.00 0.00 0.98 90.20 1045.10
07:12:10 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1069.70
07:12:11 PM 0 0.99 0.00 0.99 0.00 0.00 0.00 98.02 1070.30
07:12:12 PM 0 3.00 0.00 4.00 0.00 0.00 0.00 93.00 1067.00
Average: 0 2.19 0.00 1.10 0.10 0.00 0.10 96.51 1085.34
Listing 2.14.
mpstat can be used to determine whether the CPUs are fully utilized and relatively balanced. By observing the number of interrupts each CPU is handling, it is possible to find an imbalance. Details on how to control where interrupts are routing are provided in the kernel source under Documentation/IRQ-affinity.txt.
[ezolt@scrffy sysstat-5.1.1]$ ./mpstat -P ALL 1 2
Linux 2.6.8-1.521smp (scrffy) 10/20/2004
07:13:21 PM CPU %user %nice %sys %iowait %irq %soft %idle intr/s
07:13:22 PM all 3.98 0.00 1.00 0.25 0.00 0.00 94.78 1322.00
07:13:22 PM 0 2.00 0.00 0.00 1.00 0.00 0.00 97.00 1137.00
07:13:22 PM 1 6.00 0.00 2.00 0.00 0.00 0.00 93.00 185.00
07:13:22 PM 2 1.00 0.00 0.00 0.00 0.00 0.00 99.00 0.00
07:13:22 PM 3 8.00 0.00 1.00 0.00 0.00 0.00 91.00 0.00
07:13:22 PM CPU %user %nice %sys %iowait %irq %soft %idle intr/s
07:13:23 PM all 2.00 0.00 0.50 0.00 0.00 0.00 97.50 1352.53
07:13:23 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1135.35
07:13:23 PM 1 6.06 0.00 2.02 0.00 0.00 0.00 92.93 193.94
07:13:23 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 101.01 16.16
07:13:23 PM 3 1.01 0.00 1.01 0.00 0.00 0.00 100.00 7.07
Average: CPU %user %nice %sys %iowait %irq %soft %idle intr/s
Average: all 2.99 0.00 0.75 0.12 0.00 0.00 96.13 1337.19
Average: 0 1.01 0.00 0.00 0.50 0.00 0.00 98.49 1136.18
Average: 1 6.03 0.00 2.01 0.00 0.00 0.00 92.96 189.45
Average: 2 0.50 0.00 0.00 0.00 0.00 0.00 100.00 8.04
Average: 3 4.52 0.00 1.01 0.00 0.00 0.00 95.48 3.52
2.2.7. sar (System Activity Reporter)
sar has yet another approach to collecting system data. sar can efficiently record system performance data collected into binary files that can be replayed at a later date. sar is a low-overhead way to record information about how the system is performing.The sar command can be used to record performance information, replay previous recorded information, and display real-time information about the current system. The output of the sar command can be formatted to make it easy to pipe to relational databases or to other Linux commands for processing.
2.2.7.1 CPU Performance-Related Options
sar can be invoked with the following command line:
Although sar reports about many different areas of Linux, the statistics are of two different forms. One set of statistics is the instantaneous value at the time of the sample. The other is a rate since the last sample. Table 2-16 describes the command-line options of sar.
sar [options] [ delay [ count ] ]
2.2.7.2 Example Usage
This first command shown in Listing 2.15 takes three samples of the CPU every second, and stores the results in the binary file /tmp/apache_test. This command does not have any visual output and just returns when it has completed.
Listing 2.15.
After the information has been stored in the /tmp/apache_test file, we can display it in various formats. The default is human readable. This is shown in Listing 2.16. This shows similar information to the other system monitoring commands, where we can see how the processor was spending time at a particular time.
[ezolt@wintermute sysstat-5.0.2]$ sar -o /tmp/apache_test 1 3
Listing 2.16.
However, sar can also output the statistics in a format that can be easily imported into a relational database, as shown in Listing 2.17. This can be useful for storing a large amount of performance data. Once it has been imported into a relational database, the performance data can be analyzed with all of the tools of a relational database.
[ezolt@wintermute sysstat-5.0.2]$ sar -f /tmp/apache_test
Linux 2.4.22-1.2149.nptl (wintermute.phil.org) 03/20/04
17:18:34 CPU %user %nice %system %iowait %idle
17:18:35 all 90.00 0.00 10.00 0.00 0.00
17:18:36 all 95.00 0.00 5.00 0.00 0.00
17:18:37 all 92.00 0.00 6.00 0.00 2.00
Average: all 92.33 0.00 7.00 0.00 0.67
Listing 2.17.
Finally, sar can also output the statistics in a format that can be easily parsed by standard Linux tools such as awk, perl, python, or grep. This output, which is shown in Listing 2.18, can be fed into a script that will pull out interesting events, and possibly even analyze different trends in the data.
[ezolt@wintermute sysstat-5.0.2]$ sar -f /tmp/apache_test -H
wintermute.phil.org;1;2004-03-20 22:18:35 UTC;-1;90.00;0.00;10.00;0.00;0.00
wintermute.phil.org;1;2004-03-20 22:18:36 UTC;-1;95.00;0.00;5.00;0.00;0.00
wintermute.phil.org;1;2004-03-20 22:18:37 UTC;-1;92.00;0.00;6.00;0.00;2.00
Listing 2.18.
In addition to recording information in a file, sar can also be used to observe a system in real time. In the example shown in Listing 2.19, the CPU state is sampled three times with one second between them.
[ezolt@wintermute sysstat-5.0.2]$ sar -f /tmp/apache_test -h
wintermute.phil.org 1 1079821115 all %user 90.00
wintermute.phil.org 1 1079821115 all %nice 0.00
wintermute.phil.org 1 1079821115 all %system 10.00
wintermute.phil.org 1 1079821115 all %iowait 0.00
wintermute.phil.org 1 1079821115 all %idle 0.00
wintermute.phil.org 1 1079821116 all %user 95.00
wintermute.phil.org 1 1079821116 all %nice 0.00
wintermute.phil.org 1 1079821116 all %system 5.00
wintermute.phil.org 1 1079821116 all %iowait 0.00
wintermute.phil.org 1 1079821116 all %idle 0.00
wintermute.phil.org 1 1079821117 all %user 92.00
wintermute.phil.org 1 1079821117 all %nice 0.00
wintermute.phil.org 1 1079821117 all %system 6.00
wintermute.phil.org 1 1079821117 all %iowait 0.00
wintermute.phil.org 1 1079821117 all %idle 2.00
Listing 2.19.
The default display's purpose is to show information about the CPU, but other information can also be displayed. For example, sar can show the number of context switches per second, and the number of memory pages that have been swapped in or out. In Listing 2.20, sar samples the information two times, with one second between them. In this case, we ask sar to show us the total number of context switches and process creations that occur every second. We also ask sar for information about the load average. We can see in this example that this machine has 163 process that are in memory but not running. For the past minute, on average 1.12 processes have been ready to run.
[ezolt@wintermute sysstat-5.0.2]$ sar 1 3
Linux 2.4.22-1.2149.nptl (wintermute.phil.org) 03/20/04
17:27:10 CPU %user %nice %system %iowait %idle
17:27:11 all 96.00 0.00 4.00 0.00 0.00
17:27:12 all 98.00 0.00 2.00 0.00 0.00
17:27:13 all 92.00 0.00 8.00 0.00 0.00
Average: all 95.33 0.00 4.67 0.00 0.00
Listing 2.20.
As you can see, sar is a powerful tool that can record many different performance statistics. It provides a Linux-friendly interface that enables you to easily extract and analyze the performance data.
[ezolt@scrffy manuscript]$ sar -w -c -q 1 2
Linux 2.6.8-1.521smp (scrffy) 10/20/2004
08:23:29 PM proc/s
08:23:30 PM 0.00
08:23:29 PM cswch/s
08:23:30 PM 594.00
08:23:29 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
08:23:30 PM 0 163 1.12 1.17 1.17
08:23:30 PM proc/s
08:23:31 PM 0.00
08:23:30 PM cswch/s
08:23:31 PM 812.87
08:23:30 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
08:23:31 PM 0 163 1.12 1.17 1.17
Average: proc/s
Average: 0.00
Average: cswch/s
Average: 703.98
Average: runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
Average: 0 163 1.12 1.17 1.17
2.2.8. oprofile
oprofile is a performance suite that uses the performance counter hardware available in nearly all modern processors to track where CPU time is being spent on an entire system, and individual processes. In addition to measuring where CPU cycles are spent, oprofile can measure very low-level information about how the CPU is performing. Depending on the events supported by the underlying processor, it can measure such things as cache misses, branch mispredictions and memory references, and floating-point operations.oprofile does not record every event that occurs; instead, it works with the processor's performance hardware to sample every count events, where count is a value that users specify when they start oprofile. The lower the value of count, the more accurate the results are, but the higher the overhead of oprofile. By keeping count to a reasonable value, oprofile can run with a very low overhead but still give an amazingly accurate account of the performance of the system.Sampling is very powerful, but be careful for some nonobvious gotchas when using it. First, sampling may say that you are spending 90 percent of your time in a particular routine, but it does not say why. There can be two possible causes for a high number of cycles attributed to a particular routine. First, it is possible that this routine is the bottleneck and is taking a long amount of time to execute. However, it may also be that the function is taking a reasonable amount of time to execute, but is called a large number of times. You can usually figure out which is the case by looking at the samples around, the particularly hot line, or by instrumenting the code to count the number of calls that are made to it.The second problem of sampling is that you are never quite sure where a function is being called from. Even if you figure out that the function is being called many times and you track down all of the functions that call it, it is not necessarily clear which function is doing the majority of the calling.
2.2.8.1 CPU Performance-Related Options
oprofile is actually a suite of pieces that work together to collect CPU performance statistics. There are three main pieces of oprofile:The oprofile kernel module manipulates the processor and turns on and off sampling.The oprofile daemon collects the samples and saves them to disk.The oprofile reporting tools take the collected samples and show the user how they relate to the applications running on the system.
The oprofile suite hides the driver and daemon manipulation in the opcontrol command. The opcontrol command is used to select which events the processor will sample and start the sampling.When controlling the daemon, you can invoke opcontrol using the following command line:
This option's control, the profiling daemon, enables you to start and stop sampling and to dump the samples from the daemon's memory to disk. When sampling, the oprofile daemon stores a large amount of samples in internal buffers. However, it is only possibly to analyze the samples that have been written (or dumped) to disk. Writing to disk can be an expensive operation, so oprofile only does it periodically. As a result, after running a test and profiling with oprofile, the results may not be available immediately, and you will have to wait until the daemon flushes the buffers to disk. This can be very annoying when you want to begin analysis immediately, so the opcontrol command enables you to force the dump of samples from the oprofile daemon's internal buffers to disk. This enables you to begin a performance investigation immediately after a test has completed.Table 2-18 describes the command-line options for the opcontrol program that enable you to control the operation of the daemon.
opcontrol [--start] [--stop] [--dump]
| -s/--start | Starts profiling unless this uses a default event for the current processor |
| -d/--dump | Dumps the sampling information that is currently in the kernel sample buffers to the disk. |
| --stop | This will stop the profiling. |
The event specifier enables you to select which event is going to be sampled; how frequently it will be sampled; and whether that sampling will take place in kernel space, user space, or both. Table 2-19 describes the command-line option of opcontrol that enables you to select different events to sample.
opcontrol [--list-events] [-event=:name:count:unitmask:kernel:user:]
Typically, opreport displays all the samples collected by the system and which executables (including the kernel) are responsible for them. The executables with the highest number of samples are shown first, and are followed by all the executables with samples. In a typical system, most of the samples are in a handful of executables at the top of the list, with a very large number of executables contributing a very small number of samples. To deal with this, opreport enables you to set a threshold, and only executables with that percentage of the total samples or greater will be shown. Alternatively, opreport can reverse the order of the executables that are shown, so those with a high contribution are shown last. This way, the most important data is printed last, and it will not scroll off the screen.Table 2-20 describes these command-line options of opreport that enable you to format the output of the sampling.
opreport [-r] [-t]
| --reverse-sort / -r | Reverses the order of the sort. Typically, the images that caused the most events display first. |
| --threshold / -t [percentage] | Causes opreport to only show images that have contributed percentage or more amount of samples. This can be useful when there are many images with a very small number of samples and you are only interested in the most significant. |
2.2.8.2 Example Usage
oprofile is a very Appendix B, "Installing oprofile," contains instructions on how to get oprofile installed and running on a few of the major Linux distributions.We begin the use of oprofile by setting it up for profiling. This first command, shown in Listing 2.21, uses the opcontrol command to tell the oprofile suite where an uncompressed image of the kernel is located. oprofile needs to know the location of this file so that it can attribute samples to exact functions within the kernel.
Listing 2.21.
After we set up the path to the current kernel, we can begin profiling. The command in Listing 2.22 tells oprofile to start sampling using the default event. This event varies depending on the processor, but the default event for this processor is CPU_CLK_UNHALTED. This event samples all of the CPU cycles where the processor is not halted. The 233869 means that the processor will sample the instruction the processor is executing every 233,869 events.
[root@wintermute root]# opcontrol --vmlinux=/boot/vmlinux-2.4.22-1.2174.nptlsmp
Listing 2.22.
Now that we have started sampling, we want to begin to analyze the sampling results. In Listing 2.23, we start to use the reporting tools to figure out what is happening in the system. opreport reports what has been profiled so far.
[root@wintermute root]# opcontrol -s
Using default event: CPU_CLK_UNHALTED:233869:0:1:1
Using log file /var/lib/oprofile/oprofiled.log
Daemon started.
Profiler running.
Listing 2.23.
Uh oh! Even though the profiling has been happening for a little while, we are stopped when opreport specifies that it cannot find any samples. This is because the opreport command is looking for the samples on disk, but the oprofile daemon stores the samples in memory and only periodically dumps them to disk. When we ask opreport for a list of the samples, it does not find any on disk and reports that it cannot find any samples. To alleviate this problem, we can force the daemon to flush the samples immediately by issuing a dump option to opcontrol, as shown in Listing 2.24. This command enables us to view the samples that have been collected.
[root@wintermute root]# opreport
opreport op_fatal_error:
No sample file found: try running opcontrol --dump
or specify a session containing sample files
Listing 2.24.
After we dump the samples to disk, we try again, and ask oprofile for the report, as shown in Chapter 4, "Performance Tools: Process-Specific CPU."
[root@wintermute root]# opcontrol --dump
Listing 2.25.
When we started the oprofile, we just used the default event that opcontrol chose for us. Each processor has a very rich set of events that can be monitored. In Listing 2.26, we ask opcontrol to list all the events that are available for this particular CPU. This list is quite long, but in this case, we can see that in addition to CPU_CLK_UNHALTED, we can also monitor DATA_MEM_REFS and DCU_LINES_IN. These are memory events caused by the memory subsystem, and we investigate them in later chapters.
[root@wintermute root]# opreport
CPU: PIII, speed 467.739 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted)
with a unit mask of 0x00 (No unit mask) count 233869
3190 50.4507 vmlinux-2.4.22-1.2174.nptlsmp
905 14.3128 emacs
749 11.8456 libc-2.3.2.so
261 4.1278 ld-2.3.2.so
244 3.8589 mpg321
233 3.6850 insmod
171 2.7044 libperl.so
128 2.0244 bash
113 1.7871 ext3.o
....
Listing 2.26.
The command needed to specify which events we will monitor can be cumbersome, so fortunately, we can also use oprofile's graphical oprof_start command to graphically start and stop sampling. This enables us to select the events that we want graphically without the need to figure out the exact way to specify on the command line the events that we want to monitor.In the example of op_control shown in Figure 2-3, we tell oprofile that we want to monitor DATA_MEM_REFS and L2_LD events at the same time. The DATA_MEM_REFS event can tell us which applications use the memory subsystem a lot and which use the level 2 cache. In this particular processor, the processor's hardware has only two counters that can be used for sampling, so only two events can be used simultaneously.
[root@wintermute root]# opcontrol -l
oprofile: available events for CPU type "PIII"
See Intel Architecture Developer's Manual Volume 3, Appendix A and
Intel Architecture Optimization Reference Manual (730795-001)
CPU_CLK_UNHALTED: (counter: 0, 1)
clocks processor is not halted (min count: 6000)
DATA_MEM_REFS: (counter: 0, 1)
all memory references, cachable and non (min count: 500)
DCU_LINES_IN: (counter: 0, 1)
total lines allocated in the DCU (min count: 500)
....
Figure 2-3.
Listing 2.27.
The output provided by opreport displays all the system libraries and executables that contain any of the events that we were sampling. Note that not all the events have been recorded; because we are sampling, only a subset of events are actually recorded. This is usually not a problem, because if a particular library or executable is a performance problem, it will likely cause high-cost events to happen many times. If the sampling is random, these high-cost events will eventually be caught by the sampling code.
[root@wintermute root]# opreport
CPU: PIII, speed 467.739 MHz (estimated)
Counted DATA_MEM_REFS events (all memory references, cachable and non)
with a unit mask of 0x00 (No unit mask) count 30000
Counted L2_LD events (number of L2 data loads) with a unit mask of 0x0f
(All cache states) count 233869
87462 31.7907 17 3.8636 libmad.so.0.1.0
24259 8.8177 10 2.2727 mpg321
23735 8.6272 40 9.0909 libz.so.1.2.0.7
17513 6.3656 56 12.7273 libgklayout.so
17128 6.2257 90 20.4545 vmlinux-2.4.22-1.2174.nptlsmp
13471 4.8964 4 0.9091 libpng12.so.0.1.2.2
12868 4.6773 20 4.5455 libc-2.3.2.so
....
Optimizing Linux® Performance: A Hands-On Guide to Linux® Performance ToolsBy
Table of Contents
| Index