Sometimes, it is important to approach a misbehaving system and figure out exactly what is slowing everything down.
Because we are investigating a system-wide problem, the cause can be anywhere from user applications to system libraries to the Linux kernel. Fortunately, with Linux, unlike many other operating systems, you can get the source for most if not all applications on the system. If necessary, you can fix the problem and submit the fix to the maintainers of that particular piece. In the worst case, you can run a fixed version locally. This is the power of open-source software.
Figure 9-2 shows a flowchart of how we will diagnose a system-wide performance problem.
Go to Section 9.4.1 to begin the investigation.
Use top, procinfo, or mpstat and determine where the system is spending its time. If the entire system is spending less than 5 percent of the total time in idle and wait modes, your system is CPU-bound. Proceed to Section 9.4.3. Otherwise, proceed to Section 9.4.2.
Although the system as a whole may not be CPU-bound, in a symmetric multiprocessing (SMP) or hyperthreaded system, an individual processor may be CPU-bound.
Use top or mpstat to determine whether an individual CPU has less than 5 percent in idle and wait modes. If it does, one or more CPU is CPU-bound; in this case, go to Section 9.4.4.
Otherwise, nothing is CPU-bound. Go to Section 9.4.7.
The next Section 9.5.1 once for each process to determine where it is spending its time.
The next step is to Section 9.5.1.
It appears as if the kernel is spending a lot of time doing work not on behalf of an application. One explanation for this is an I/O card that is raising many interrupts, such as a busy network card. Run procinfo or cat /proc/interrupts to determine how many interrupts are being fired, how often they are being fired, and which devices are causing them. This may provide a hint as to what the system is doing. Record this information and proceed to Section 9.4.6.
Finally, we will find Section 9.9.
The next Section 9.6.1.
If the amount of used swap is not increasing, go to Section 9.4.8.
While Section 9.9.
Next, run vmstat (or iostat) and see how Section 9.7.1. Otherwise, continue to Section 9.4.10.
Next, we see Section 9.8.1. If none of the network devices seem to be passing network traffic, the kernel is waiting on some other I/O device that is not covered in this book. It may be useful to see what functions the kernel is calling and what devices are interrupting the kernel. Go to Section 9.4.5.