Establishing a Baseline for Disk Usage
Start your disk monitoring process by establishing a baseline, which is the level of performance you can expect during typical usage and workloads. Establishing a baseline consists of collecting and analyzing data about typical disk usage under typical disk load.
Monitor disk counters along with counters from other objects. The following is a list of recommended counters.
LogicalDisk% Free Space
PhysicalDiskDisk Reads/sec
PhysicalDiskDisk Writes/sec
PhysicalDiskAvg. Disk Queue Length
MemoryAvailable Bytes
MemoryCache Bytes
MemoryPages/sec
Processor(All_Instances)% Processor Time
SystemProcessor Queue Length
Figure 30.5 depicts a typical display for collecting overall system performance data.
Figure 30.5 Counter Configuration for Baseline Monitoring
Observe activity at various times of day over a range of intervals, starting with one day, one week, one month, and so on. Over time a pattern develops, and you can see that the data tends to fall consistently within a particular range of values—that resulting range is your baseline.
You can monitor for short intervals such as two to five seconds, if your workload is characterized by random bursts of heavy activity. Otherwise 60-second intervals are adequate. If system demands fluctuate during the day, you might want to take shorter samples during periods of heaviest activity and longer samples when activity is tapering off.
For best results during monitoring, try to isolate the disk so that workload unrelated to your test does not affect your results. If you are logging performance to a disk that you are monitoring, values for the disk reflect a small amount of writing activity for that logging.
While analyzing values at specific times, notice the type of work being performed on your system. Knowing the schedule and nature of your workload is important if you need to reschedule that work or distribute to other systems for better performance.
When interpreting log data, remember the limitations of the performance counters that report sums or that report disk time. The counters sum the totals rather than recalculate them over the number of disks. In addition, disk-time percentage counters cannot exceed 100 percent. Instead, use the Avg. Disk Queue Length, Avg. Disk Read Queue Length, and Avg. Disk Write Queue Length counters to display disk activity as a decimal, rather than a percentage, so that it displays values over 1.0 (100 percent). Then, remember to recalculate the values over the whole disk configuration.
NOTEAlthough disk-time percentage counters cannot exceed 100 percent by default, you can reset the registry to allow System Monitor to display percentages exceeding 100 percent if appropriate. For information about this adjustment and other aspects of performance data collection and reporting, see "Performance Objects" in "Overview of Performance Monitoring" in this book.
You can exclude spiking values from your baseline, but make sure you understand what causes them. For example, if you run a weekly backup every Friday night, it is acceptable to see out-of-range disk values during that time. But it is important that you know why the spikes are happening. If the pattern starts to shift or you feel that the baseline performance is not satisfactory, use additional counters to monitor disk activity and usage as described in the following sections. You might need to upgrade resources as described in "Resolving Disk Bottlenecks" later in this chapter. If you have access to source code for applications that are in use, you might want to fine-tune these for more efficient data access.
When counter values fall outside the range established for your baseline, follow the instructions contained in "Investigating Disk Performance Problems" later in this chapter. If you encounter a problem or need information about how to improve performance, see "Resolving Disk Bottlenecks" later in this chapter.