Recognizing a Processor Bottleneck
Processor bottlenecks occur when the processor is so busy that it cannot respond to requests for time. Although a high rate of processor activity might indicate an excessively busy processor, a long, sustained processor queue is a more certain indicator. As you monitor processor and related counters, you can recognize a developing bottleneck by the following conditions:
Processor % Processor Time often exceeds 80 percent.System Processor Queue Length is often greater than 2 on a single-processor system.Unusually high values appear for the Processor(_Total) Interrupts/sec or System Context Switches/sec counters.
The most common causes for processor bottlenecks are insufficient memory or excessive numbers of interrupts from disk or network input/output (I/O). To investigate these possible causes, see the following chapters:
"Evaluating Memory and Cache Usage" in this book"Disk Concepts and Troubleshooting" in this book
For more information about network performance, see "Monitoring Network Performance" in the Server Operations Guide.Also, the Processor(_Total) Interrupts/sec counter value might rise dramatically if you've recently added many new applications or users. During periods of low activity the only source of interrupts might be the processor's timer ticks; these are periodic events that increment a processor hardware timer. These occur approximately every 10 to 15 milliseconds, or about 66 to 100 interrupts per second. Interrupt rates vary depending on system workload, including network packets per second and disk I/O operations per second. Watch for interrupt values that fall out of a normal range (expect these to be from 200 to 300 on Microsoft® Windows® 2000 Professional). If Processor % Interrupt Time exceeds 20 to 30 percent per processor, it might indicate that the system is generating more processor interrupts than it can handle. If this is the case, you might need to upgrade some of your components. For more information, see "Monitoring Network Performance" in the Server Operations Guide.If a processor bottleneck does not exist but you are dissatisfied with system performance, and you have ruled out memory and other hardware factors, consider the following options to improve CPU response time or throughput:
Schedule processor-intensive applications to run when the system load is low. Use Scheduled Tasks in Control Panel or the at command to do this.Upgrade to a faster processor. Upgrading to a higher-speed processor with a larger Level 2 (L2) cache expedites processing regardless of your system's workload.
When upgrading to a faster processor, check with the chip vendor to ensure that you use the correct memory speed for the chip. Incompatible memory speed could cause a computer with a faster processor to appear to run more slowly than a computer with a slower processor.
NOTEIf conditions do not warrant immediate processor replacement, begin monitoring processor activity and system performance as described in the following sections.
Using multiple processors rather than switching to a faster one might not dramatically improve performance. For example, a 200-megahertz (MHz) dual-processor computer might not perform equally to a 400-MHz uniprocessor computer with all workloads because of overhead inherent in synchronization. Because scaling can incur some overhead, it is important to be aware of the factors involved and how to manage them. For more information, see "Measuring Multiprocessor System Activity" in the Server Operations Guide.
Examining the Processor Time Counter
The Processor % Processor Time counter determines the percentage of time the processor is busy by measuring the percentage of time the thread of the Idle process is running and then subtracting that from 100 percent. This measurement is the amount of processor utilization. Although you might sometimes see high values for the Processor % Processor Time counter (70 percent or greater depending on your workload and environment), it might not indicate a problem; you need more data to understand this activity. For example, high processor-time values typically occur when you are starting a new process and should not cause concern.
NOTETo illustrate, consider that Windows 2000 allows an application to consume all available processor time if no other thread is waiting. As a result, System Monitor shows processor-time rates of 100 percent. If the threads have equal or greater priority, as soon as another thread requests processor time, the thread that was consuming 100 percent of CPU time yields control so that the requesting thread can run, causing processor time to lessen. For a discussion of thread priority and scheduling, see "Threads in a Bottleneck" later in this chapter.If you establish that processor-time values are consistently high during certain processes, you need to determine whether a processor bottleneck exists by examining processor queue length data. Unless you already know the characteristics of the applications running on the system, upgrading or adding processors at this point would be a premature response to persistently high processor values, even values of 90 percent or higher. First, you need to know whether processor load is keeping important work from being done. You have several options for addressing processor bottlenecks, but you need to first verify their existence.If you begin to see values of 70 percent or more for the Processor % Processor Time counter, investigate your processor's activity as follows:
The value that characterizes high processor utilization depends greatly on your system and workload. This chapter describes 70 percent as a typical threshold value; however, you can define your target maximum utilization at a higher or lower value. If so, substitute that target value for 70 percent in the examples provided in this section.
Examine System Processor Queue Length.Identify the processes that are running when Processor % Processor Time and System Processor Queue Length values are highest.
Observing Processor Queue Length
A collection of one or more threads that is ready but not able to run on the processor due to another active thread that is currently running is called the processor queue. The clearest symptom of a processor bottleneck is a sustained or recurring queue of more than two threads. Although queues are most likely to develop when the processor is very busy, they can develop when utilization is well below 90 percent. This can happen if requests for processor time arrive randomly and if threads demand irregular amounts of time from the processor. For more information about monitoring and adjusting thread scheduling, see "Threads in a Bottleneck" later in this chapter.The System Processor Queue Length counter shows how many threads are ready in the processor queue but not currently able to use the processor. Figure 29.2 shows a sustained processor queue with utilization ranging from 60 to 90 percent. Notice that the default scale for the Processor Queue Length counter value is 10. Therefore, System Monitor graphs a queue that contains two threads as 20. You can change the scale factor by using the Data properties tab in System Monitor.
Figure 29.2 Sustained Processor Queue with Rising Processor Usage
In Figure 29.2, the line across the top represents Processor(_Total) % Processor Time. The lower line is System Processor Queue Length.Figure 29.3 shows a sustained processor queue accompanied by processor use at or near 100 percent.
Figure 29.3 Sustained Processor Queue with Maximum Processor Usage
Figure 29.4 illustrates how a processor bottleneck interferes with your computer's performance. It shows that when a processor is already at 100 percent utilization, starting another process does not accomplish more work.
Figure 29.4 Saturated Processor
In Figure 29.4, the dark line running near the top of the graph is Processor(_Total) % Processor Time. The line below it is System Processor Queue Length. Midway through the sample interval, a process with three threads was started. The graph illustrates that the queue increased as a result of this added workload. Some of the threads of the added process might be in the queue, or they might be running, having displaced the threads of a lower-priority process. Nonetheless, because the processor was already at maximum capacity, it can accomplish no additional work.If your system's counter values appear similar to those in Figure 29.4, this indicates a bottleneck. Over time, logging reveals any patterns associated with the bottleneck. For example, you might find that bottlenecks occur when certain processes are running or at a certain time of day. In this case, you might be able to eliminate the bottleneck by balancing the workload between computers-that is, running the process on another less-loaded computer.However, if sustained queues appear frequently, you need to investigate the processes that are running when threads collect in the queue. To do this:
Identify the processes that are consuming processor time. Determine whether a single process or multiple processes are active during a bottleneck. Running processes appear in the Instance box when you select the Process % Processor Time counter. For more information, see "Processes in a Bottleneck" later in this chapter.Scrutinize the processor-intensive processes. Determine how many threads run in the process and watch the patterns of thread activity during a bottleneck.Evaluate the priorities at which the process and its threads run. You might be able to eliminate a bottleneck merely by adjusting the base priority of the process or the current priorities of its threads. However, Microsoft does not recommend this as a long-term solution. Use Task Manager to find the base priority of the process.
NOTEThere are other objects that track processor queue length. The Server Work Queues Queue Length counter reports the number of requests in the queue for the processor on the selected server. For more information about monitoring the Server Work Queues object, see "Monitoring Network Performance" in the Server Operations Guide.
Different guidelines apply for queue lengths on multiprocessor systems. For busy systems (those having processor utilization in the 80 to 90 percent range) that use thread scheduling, the queue length should range from one to three threads per processor. For example, on a four-processor system, the expected range of processor queue length on a system with high CPU activity is 4 to 12.
On systems with lower CPU utilization, the processor queue length is typically 0 or 1.
Monitoring Interrupts
Sharply rising counts for interrupts can affect your processor's performance, and you need to investigate their cause. The Processor Interrupts/sec counter reports the number of interrupts the processor is servicing from applications or hardware devices. You can expect interrupts to range upward from 100 per second for computers running Windows 2000 Professional. This interrupt rate is dependent on the rate of disk I/O operations per second and network packets per second. If your interrupt counter values are out of range, there might be hardware problems such as a conflict between the hard-disk controller and a network adapter. You can use System Information and Device Manager in the Computer Management console to check for problems with the disk controller or network adapter.You might want to monitor interrupts along with I/O activity involving both disks and network adapters. Use the Disk Reads/sec or Disk Writes/sec counters on the PhysicalDisk object to monitor disk I/O as described in "Examining and Tuning Disk Performance" in this book. Use the network transmission counters to monitor network activity as described in "Monitoring Network Performance" in the Server Operations Guide. You can tell if interrupt activity is becoming a problem by determining the ratio of interrupts to I/O operations. An optimal ratio is one interrupt to four or five I/O operations. A one-to-one correspondence between these factors indicates poor performance and requires action.If network or disk I/O is involved, you should consider upgrading to a controller and a driver that support interrupt moderation or interrupt avoidance. Interrupt moderation allows a processor to process interrupts more efficiently by grouping several interrupts to a single hardware interrupt. Interrupt avoidance allows a processor to continue processing interrupts without new interrupts being queued until all pending interrupts are complete. For more information about managing interrupts from network adapters, see "Monitoring Network Performance" in the Server Operations Guide.High values for % Processor Time for threads of the System process can also indicate a problem with a device driver.
Monitoring Context Switches
A context switch occurs when the kernel switches the processor from one thread to another, for example, when a thread with a higher priority than the running thread becomes ready. Context switching activity is important for several reasons. A program that monopolizes the processor lowers the rate of context switches because it does not allow much processor time for the other processes' threads. A high rate of context switching means that the processor is being shared repeatedly, for example, by many threads of equal priority. A high context-switch rate often indicates that there are too many threads competing for the processors on the system.
NOTEYou can view context switch data in two ways:
The rate of context switches can also affect performance of multiprocessor computers. For more information about how to monitor and tune context-switch activity on multiprocessor systems, see "Measuring Multiprocessor System Activity" in the Server Operations Guide.
The System Context Switches/sec counter in System Monitor reports systemwide context switches.The Thread(_Total) Context Switches/sec counter reports the total number of context switches generated per second by all threads.
Although these counters might vary slightly due to sampling, generally they will be nearly equal.Figure 29.5 plots System Context Switches/sec during a temporary bottleneck.
Figure 29.5 Systemwide Context Switches During a Processor Bottleneck
In Figure 29.5, Processor(_Total) % Processor Time jumps to about 60 percent during the sample interval. System Processor Queue Length (scaled by a factor of 10) shows that the queue varies from 2 to 6 with a mean near 4. System Context Switches (shown scaled by a factor of 10) reveals an average of about 750 switches per second. A rate of context switches from 500 to 2,000 per second might indicate that you have a problem with a network adapter or a device driver or that you are using an inefficient server-based application that spawns too many threads.The Pviewer utility on the Windows 2000 operating system CD reports context switch data. For information about installing and using the Windows 2000 Support Tools and Support Tools Help, see the file Sreadme.doc in the SupportTools folder of the Windows 2000 operating system CD.