6.2. Disk I/O Performance Tools
This section examines the various disk I/O performance tools that enable you to investigate how a given application is using the disk I/O subsystem, including how heavily each disk is being used, how well the kernel's disk cache is working, and which files a particular application has "open."
6.2.1. vmstat (ii)
As you saw in Chapter 2, "Performance Tools: System CPU," vmstat is a great tool to give an overall view of how the system is performing. In addition to CPU and memory statistics, vmstat can provide a system-wide view of I/O performance.
6.2.1.1 Disk I/O Performance-Related Options and Outputs
While using vmstat to retrieve disk I/O statistics from the system, you must invoke it as follows:
Table 6-1 describes the other command-line parameters that influence the disk I/O statistics that vmstat will display.
vmstat [-D] [-d] [-p partition] [interval [count]]
6.2.1.2 Example Usage
The number Chapter 2.
Listing 6.1.
Listing 6.1 shows that during one of the samples, the system read 24,448 disk blocks. As mentioned previously, the block size for a disk is 1,024 bytes, so this means that the system is reading in data at about 23MB per second. We can also see that during this sample, the CPU was spending a significant portion of time waiting for I/O to complete. The CPU waits on I/O 63 percent of the time during the sample in which the disk was reading at ~23MB per second, and it waits on I/O 49 percent for the next sample, in which the disk was reading at ~19MB per second.Next, in Listing 6.2, we ask vmstat to provide information about the I/O subsystem's performance since system boot.
[ezolt@wintermute procps-3.2.0]$ ./vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 197020 81804 29920 0 0 236 25 1017 67 1 1 93 4
1 1 0 172252 106252 29952 0 0 24448 0 1200 395 1 36 0 63
0 0 0 231068 50004 27924 0 0 19712 80 1179 345 1 34 15 49
Listing 6.2.
In Listing 6.2, vmstat provides I/O statistic totals for all the disk devices in the system. As mentioned previously, when reading and writing to a disk, the Linux kernel tries to merge requests for contiguous regions on the disk for a performance increase; vmstat reports these events as merged reads and merged writes. In this example, a large number of the reads issued to the system were merged before they were issued to the device. Although there were ~640,000 merged reads, only ~53,000 read commands were actually issued to the drives. The output also tells us that a total of 4,787,741 sectors have been read from the disk, and that since system boot, 343,552ms (or 344 seconds) were spent reading from the disk. The same statistics are available for write performance. This view of I/O statistics is a good view of the overall I/O subsystem's performance.Although the previous example displayed I/O statistics for the entire system, the following example in Listing 6.3 shows the statistics broken down for each individual disk.
[ezolt@wintermute procps-3.2.0]$ ./vmstat -D
3 disks
5 partitions
53256 total reads
641233 merged reads
4787741 read sectors
343552 milli reading
14479 writes
17556 merged writes
257208 written sectors
7237771 milli writing
0 inprogress IO
342 milli spent IO
Listing 6.3.
Listing 6.4 shows that 60 (19,059 18,999) reads and 94 writes (24,795 24,795) have been issued to partition hde3. This view can prove particularly useful if you are trying to determine which partition of a disk is seeing the most usage.
[ezolt@wintermute procps-3.2.0]$ ./vmstat -d 1 3
disk ----------reads------------ -----------writes----------- -------IO-------
total merged sectors ms total merged sectors ms cur s
fd0 0 0 0 0 0 0 0 0 0 0
hde 17099 163180 671517 125006 8279 9925 146304 2831237 0 125
hda 0 0 0 0 0 0 0 0 0 0
fd0 0 0 0 0 0 0 0 0 0 0
hde 17288 169008 719645 125918 8279 9925 146304 2831237 0 126
hda 0 0 0 0 0 0 0 0 0 0
fd0 0 0 0 0 0 0 0 0 0 0
hde 17288 169008 719645 125918 8290 9934 146464 2831245 0 126
hda 0 0 0 0 0 0 0 0 0 0
Listing 6.4.
Although vmstat provides statistics about individual disks/partitions, it only provides totals rather than the rate of change during the sample. This can make it difficult to eyeball which device's statistics have changed significantly from sample to sample.
[ezolt@wintermute procps-3.2.0]$ ./vmstat -p hde3 1 3
hde3 reads read sectors writes requested writes
18999 191986 24701 197608
19059 192466 24795 198360
19161 193282 24795 198360
6.2.2. iostat
iostat is like vmstat, but it is a tool dedicated to the display of the disk I/O subsystem statistics. iostat provides a per-device and per-partition breakdown of how many blocks are written to and from a particular disk. (Blocks in iostat are usually sized at 512 bytes.) In addition, iostat can provide extensive information about how a disk is being utilized, as well as how long Linux spends waiting to submit requests to the disk.
6.2.2.1 Disk I/O Performance-Related Options and Outputs
iostat is invoked using the following command line:
Much like vmstat, iostat can display performance statistics at regular intervals. Different options modify the statistics that iostat displays. These options are described in Table 6-6.
iostat [-d] [-k] [-x] [device] [interval [count]]
6.2.2.2 Example Usage
Listing 6.5 shows an example iostat run while a disk benchmark is writing a test file to the file system on the /dev/hda2 partition. The first sample iostat displays is the total system average since system boot time. The second sample (and any that would follow) is the statistics from each 1-second interval.
Listing 6.5.
One interesting note in the preceding example is that /dev/hda3 had a small amount of activity. In the system being tested, /dev/hda3 is a swap partition. Any activity recorded from this partition is caused by the kernel swapping memory to disk. In this way, iostat provides an indirect method to determine how much disk I/O in the system is the result of swapping.Listing 6.6 shows the extended output of iostat.
[ezolt@localhost sysstat-5.0.2]$ ./iostat -d 1 2
Linux 2.4.22-1.2188.nptl (localhost.localdomain) 05/01/2004
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 7.18 121.12 343.87 1344206 3816510
hda1 0.00 0.03 0.00 316 46
hda2 7.09 119.75 337.59 1329018 3746776
hda3 0.09 1.33 6.28 14776 69688
hdb 0.00 0.00 0.00 16 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 105.05 5.78 12372.56 16 34272
hda1 0.00 0.00 0.00 0 0
hda2 100.36 5.78 11792.06 16 32664
hda3 4.69 0.00 580.51 0 1608
hdb 0.00 0.00 0.00 0 0
Listing 6.6.
In Listing 6.6, you can see that the average queue size is pretty high (~237 to 538) and, as a result, the amount of time that a request must wait (~422.44ms to 538.60ms) is much greater than the amount of time it takes to service the request (7.63ms to 11.90ms). These high average service times, along with the fact that the utilization is 100 percent, show that the disk is completely saturated.The extended iostat output provides so many statistics that it only fits on a single line in a very wide terminal. However, this information is nearly all that you need to identify a particular disk as a bottleneck.
[ezolt@localhost sysstat-5.0.2]$ ./iostat -x -dk 1 5 /dev/hda2
Linux 2.4.22-1.2188.nptl (localhost.localdomain) 05/01/2004
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 11.22 44.40 3.15 4.20 115.00 388.97 57.50 194.49
68.52 1.75 237.17 11.47 8.43
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1548.00 0.00 100.00 0.00 13240.00 0.00 6620.00
132.40 55.13 538.60 10.00 100.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1365.00 0.00 131.00 0.00 11672.00 0.00 5836.00
89.10 53.86 422.44 7.63 100.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1483.00 0.00 84.00 0.00 12688.00 0.00 6344.00
151.0 39.69 399.52 11.90 100.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 2067.00 0.00 123.00 0.00 17664.00 0.00 8832.00
143.61 58.59 508.54 8.13 100.00
6.2.3. sar
As discussed Chapter 2, "Performance Tools: System CPU," sar can collect the performance statistics of many different areas of the Linux system. In addition to CPU and memory statistics, it can collect information about the disk I/O subsystem.
6.2.3.1 Disk I/O Performance-Related Options and Outputs
When using sar to monitor disk I/O statistics, you can invoke it with the following command line:
Typically, sar displays information about the CPU usage in a system; to display disk usage statistics instead, you must use the -d option. sar can only display disk I/O statistics with a kernel version higher than 2.5.70. The statistics that it displays are described in Table 6-9.
sar -d [ interval [ count ] ]
6.2.3.2 Example Usage
In Listing 6.7, sar is used to collect information about the I/O of the devices on the system. sar lists the devices by their major and minor number rather than their names.
Listing 6.7.
sar has a limited number of disk I/O statistics when compared to iostat. However, the capability of sar to simultaneously record many different types of statistics may make up for these shortcomings.
[ezolt@wintermute sysstat-5.0.2]$ sar -d 1 3
Linux 2.6.5 (wintermute.phil.org) 05/02/04
16:38:28 DEV tps rd_sec/s wr_sec/s
16:38:29 dev2-0 0.00 0.00 0.00
16:38:29 dev33-0 115.15 808.08 2787.88
16:38:29 dev33-64 0.00 0.00 0.00
16:38:29 dev3-0 0.00 0.00 0.00
16:38:29 DEV tps rd_sec/s wr_sec/s
16:38:30 dev2-0 0.00 0.00 0.00
16:38:30 dev33-0 237.00 1792.00 8.00
16:38:30 dev33-64 0.00 0.00 0.00
16:38:30 dev3-0 0.00 0.00 0.00
16:38:30 DEV tps rd_sec/s wr_sec/s
16:38:31 dev2-0 0.00 0.00 0.00
16:38:31 dev33-0 201.00 1608.00 0.00
16:38:31 dev33-64 0.00 0.00 0.00
16:38:31 dev3-0 0.00 0.00 0.00
Average: DEV tps rd_sec/s wr_sec/s
Average: dev2-0 0.00 0.00 0.00
Average: dev33-0 184.62 1404.68 925.75
Average: dev33-64 0.00 0.00 0.00
Average: dev3-0 0.00 0.00 0.00
6.2.4. lsof (List Open Files)
lsof provides a way to determine which processes have a particular file open. In addition to tracking down the user of a single file, lsof can display the processes using the files in a particular directory. It can also recursively search through an entire directory tree and list the processes using files in that directory tree. lsof can prove helpful when narrowing down which applications are generating I/O.
6.2.4.1 Disk I/O Performance-Related Options and Outputs
You can invoke lsof with the following command line to investigate which files processes have open:
Typically, lsof displays which processes are using a given file. However, by using the +d and +D options, it is possible for lsof to display this information for more than one file. Table 6-10 describes the command-line options of lsof that prove helpful when tracking down an I/O performance problem.
lsof [-r delay] [+D directory] [+d directory] [file]
6.2.4.2 Example Usage
Listing 6.8 shows lsof being run on the /usr/bin directory. This run shows which processes are accessing all of the files in /usr/bin.
Listing 6.8.
In particular, we can see that process 3807 is using the file /usr/bin/gnome-terminal. This file is an executable, as indicated by the txt in the FD column, and the name of the command that is using it is gnome-terminal. This makes sense; the process that is running gnome-terminal must therefore have the executable open. One interesting thing to note is that this file is on the device 3,2, which corresponds to /dev/hda2. (You can figure out the device number for all the system devices by executing ls -la /dev and looking at the output field that normally displays size.) Knowing on which device a file is located can help if you know that a particular device is the source of an I/O bottleneck. lsof provides the unique ability to trace an open file descriptor back to individual processes; although it does not show which processes are using a significant amount of I/O, it does provide a starting point.
[ezolt@localhost manuscript]$ /usr/sbin/lsof -r 2 +D /usr/bin/
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
gnome-ses 2162 ezolt txt REG 3,2 113800 597490 /usr/bin/gnome-session
ssh-agent 2175 ezolt txt REG 3,2 61372 596783 /usr/bin/ssh-agent
gnome-key 2182 ezolt txt REG 3,2 77664 602727 /usr/bin/gnome-keyring-daemon
metacity 2186 ezolt txt REG 3,2 486520 597321 /usr/bin/metacity
gnome-pan 2272 ezolt txt REG 3,2 503100 602174 /usr/bin/gnome-panel
nautilus 2280 ezolt txt REG 3,2 677812 598239 /usr/bin/nautilus
magicdev 2287 ezolt txt REG 3,2 27008 598375 /usr/bin/magicdev
eggcups 2292 ezolt txt REG 3,2 32108 599596 /usr/bin/eggcups
pam-panel 2305 ezolt txt REG 3,2 45672 600140 /usr/bin/pam-panel-icon
gnome-ter 3807 ezolt txt REG 3,2 289116 596834 /usr/bin/gnome-terminal
less 6452 ezolt txt REG 3,2 104604 596239 /usr/bin/less
=======
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
gnome-ses 2162 ezolt txt REG 3,2 113800 597490 /usr/bin/gnome-session
ssh-agent 2175 ezolt txt REG 3,2 61372 596783 /usr/bin/ssh-agent
gnome-key 2182 ezolt txt REG 3,2 77664 602727 /usr/bin/gnome-keyring-daemon
metacity 2186 ezolt txt REG 3,2 486520 597321 /usr/bin/metacity
gnome-pan 2272 ezolt txt REG 3,2 503100 602174 /usr/bin/gnome-panel
nautilus 2280 ezolt txt REG 3,2 677812 598239 /usr/bin/nautilus
magicdev 2287 ezolt txt REG 3,2 27008 598375 /usr/bin/magicdev
eggcups 2292 ezolt txt REG 3,2 32108 599596 /usr/bin/eggcups
pam-panel 2305 ezolt txt REG 3,2 45672 600140 /usr/bin/pam-panel-icon
gnome-ter 3807 ezolt txt REG 3,2 289116 596834 /usr/bin/gnome-terminal
less 6452 ezolt txt REG 3,2 104604 596239 /usr/bin/less
Optimizing Linux® Performance: A Hands-On Guide to Linux® Performance ToolsBy
Table of Contents
| Index