Performance Tuning for Linux Servers [Electronic resources] نسخه متنی

Performance Inspector

The Performance Inspector (PI) is a suite of tools that identifies performance problems and characteristics. PI is distributed with a kernel patch and the source to build the device driver and tools. Procedures are also included to automate the install process and build for installing the tools. Support is provided for Intel 32- and 64-bit platforms, Power PC 32- and 64-bit systems, S390 32- and 64-bit systems, and the AMD Hammer processor.

The tools included in all versions of Performance Inspector include the following:

Above Idle.
Shows how idle, process, and interrupt times are distributed over the processor(s) on the running system. Above Idle is a phase 1 tool that identifies hot spots with respect to processor time and interrupt time spent in the system. This tool can also be used on SMP systems to see how well the multiple active processes are being spread over multiple CPUs.

Per-Thread Time.
Hooks into the process dispatch and interrupt code to maintain information on the total amount of time spent within a process, the time spent handling interrupts, and the amount of idle time. Summary information is provided on a per-CPU basis and for the system as a whole. APIs are provided to allow a developer to include calls within this application to measure functions and code snippets.

System Trace Data Collection.
Trace hooks are added to the kernel to collect data on what is happening within the system. Hooks are provided for process dispatches, process timeslices, interrupt entry and exits, process forks and clones, process startup execution, system timer interrupts, and memory mapping of code segments. It is also possible for an application to write its own trace records.

Command files, libraries, and post processors are included to allow automatic tracing functions and reports. The most significant of these functions allow the measurement and reporting of jitted methods within Java applications. This function is very useful in identifying what methods are running and how much time they are using.

JPROF.
Performs Java execution profiling. This tool includes the capability to obtain detailed information on jitted methods. To use this tool, use a version of the Java SDK that includes JVMPI support.

Java Lock Monitor (JLM).
Reports hold-time accounting and contention statistics with Java applications. The IBM Java 1.4 SDK is required to use this tool.

In addition to these common measurement functions, an additional set of functionality is provided on Intel 32-bit systems. This functionality includes the following:

Performance counter support.
Supports starting and stopping the counter, reporting counter contents, and displaying counter settings. The Trace Data Collection and Per-Thread Time functions base their measurements on the performance counters instead of the system clock. Instead of seeing how much time is spent within a process, you see how many instructions are executed, or how many branches or jumps are performed.

Instruction tracing.
Instruction tracing records all the branches taken while the measurement is active. Branches include calls, jumps, and any other execution path change. A post-processor tool is provided to report an instruction trace, including the number of instructions executed and where in the code the execution occurs. This reporting can also be done for Java jitted methods.

Dynamic kernel patch.
For certain releases of Red Hat and SUSE distributions, a version of the tools is provided that does not require a patch to the Linux kernel. Instead, the device driver dynamically patches the required hooks into the running system and runs the Performance Inspector tool suite.

The Performance Inspector installation requires you to apply a kernel patch and rebuild the kernel. After that step is complete, the various tools are built. Because some of the tools are sensitive to the version of Java that is being used, the tool build process must be redone when a new version of the Java support is used. However, this requirement does not affect the kernel. The kernel needs to be patched and built only once.

An exception to the installation procedure is provided for the Intel 32-bit support. A special version of the Performance Inspector is provided that does not require the kernel to be patched as long as the kernel being used is the default kernel shipped with supported versions of Red Hat and SUSE. The kernel source needs to be installed and the tools built as before, but the kernel does not have to be rebuilt. The PI driver dynamically patches the kernel at the appropriate places. The kernel patches are removed when the driver is uninstalled.

When none of the PI performance probes in the kernel are active, there is minimal impact on system performance. At most, a compare and short branch instruction are added to the code path where the probes are located. For the dynamic version of the IA32 PI, no extra overhead is added when the driver is not loaded.

When performing a trace, there is usually a 2% to 3% overhead on the system. Instruction tracing obviously adds more overhead, but it is for use in a debug environment. N tracing is done in a working production environment without any major impacts to performance or throughput.

Most of the PI functions can be controlled from APIs issued from the user's program as well as from the PI's own command files. When using the API interface, either from C code or Java, you can fine-tune when, where, and what information is collected. All of the source is provided and can be used as a coding sample.

Refer to http://www-124.ibm.com/developerworks/oss/pi/indexl for more information and to obtain a copy of the Performance Inspector.

The remainder of this chapter examines in detail the various features of the Performance Inspector.

Above Idle

Above Idle works by hooking into the process timeslice logic and interrupt handler within the kernel. Above Idle keeps track of the amount of time that is spent while a processor is busy, idle, or handling interrupts. It is useful for identifying processor overload, high interrupt activity, and poor distribution of work within a multiprocessor system. When active, Above Idle gathers information over a user-specified measurement period. The default is 1 second. When using the default, Above Idle determines the amount of time the system spent while idle, active, and processing interrupts. The percentage of processor idle, active, and interrupt processing time is then calculated and displayed by the processor when running on a multiprocessor system. Parameters are used to define how many measurements to make and the time interval between measurements.

To start the Above Idle measurement, enter swtrace ai, which uses the defaults to report system usage every second until the measurement is manually stopped.

The following example shows a sample Above Idle output.


# swtrace ai
Above Idle instrumentation enabled
CPU 0  IDLE= 48.83, BUSY= 49.74, INTR=  1.44
CPU 1  IDLE=  0.00, BUSY= 99.81, INTR=  0.19
CPU 2  IDLE= 96.10, BUSY=  3.70, INTR=  0.20
CPU 3  IDLE= 22.10, BUSY= 77.71, INTR=  0.19
CPU 4  IDLE= 16.50, BUSY= 83.30, INTR=  0.20
CPU 5  IDLE= 16.66, BUSY= 83.10, INTR=  0.25
CPU 6  IDLE= 16.83, BUSY= 82.96, INTR=  0.21
CPU 7  IDLE= 60.72, BUSY= 39.08, INTR=  0.20
CPU 0  IDLE= 98.54, BUSY=  0.00, INTR=  1.46
CPU 1  IDLE= 86.35, BUSY= 13.48, INTR=  0.17
CPU 2  IDLE= 99.78, BUSY=  0.00, INTR=  0.22
CPU 3  IDLE= 99.77, BUSY=  0.00, INTR=  0.23
CPU 4  IDLE= 18.91, BUSY= 80.87, INTR=  0.22
CPU 5  IDLE= 10.75, BUSY= 89.05, INTR=  0.20
CPU 6  IDLE= 99.77, BUSY=  0.00, INTR=  0.23
CPU 7  IDLE= 99.73, BUSY=  0.00, INTR=  0.27
CPU 0  IDLE= 98.59, BUSY=  0.00, INTR=  1.41
CPU 1  IDLE= 99.80, BUSY=  0.00, INTR=  0.20
CPU 2  IDLE= 99.81, BUSY=  0.00, INTR=  0.19
CPU 3  IDLE= 99.79, BUSY=  0.00, INTR=  0.21
CPU 4  IDLE=  1.19, BUSY= 98.65, INTR=  0.16
CPU 5  IDLE=  0.00, BUSY= 99.77, INTR=  0.23
CPU 6  IDLE= 99.78, BUSY=  0.00, INTR=  0.22
CPU 7  IDLE= 99.85, BUSY=  0.00, INTR=  0.15

The example was run on an eight-way SMP machine and shows data collected over 3 seconds.

Per-Thread Time

When active, per-thread time (PTT) accumulates the amount of time spent by each process within the system, as well as idle time and interrupt time. If you are processing on an Intel IA 32-bit system, you can use a performance event counter instead of using the real-time clock. For example, a measurement can be made that identifies the number of instructions executed by each process, the idle process, and the interrupts. The totals for process time, IRQ time, and idle time are provided for each processor in the system, as well as a system total of all processors.

PTT is activated via an API call, PTTInit, or an external program called ptt. In both cases, you specify the measurement medium as system timer or performance counter. After it is activated, you can run a program called pttstats to display the current values for idle, process, and interrupt times. You can run the pttstats program any number of times. A terminate command stops the measurement. The last counts before the terminate command is issued remain stored in the buffers until another init command is received. The following shows a sample PTT report:


Tools version 2040024
PTT_STATISTICS_SIZE = 263096
IBM_NR_CPUS = 16
Per Thread Time Summary
Total Dispatches:       3763
Total IRQs:             334
Total Thread Time:      151469140
Total IRQ Time:         721564
Total Idle Time:        2158895609
Total tool time:        81755
Total Measurement:      2311086313
Total Period:           0
Per Thread Time Detail
Details by CPU
IRQ      Dispatch
CPU          Count    Count        Times
----------------------------------------------------
1            334      3763
Thread Time:                          151469140
IRQ Time:                             721564
Idle Time:                            2158895609
Tool Time:                            81755
Total Time:                           5311168068
Details by Process
Dispatch    IRQ         Process
PID      Count       Count       Time
--------------------------------------------------
0        708         303       2158895609
1          5           0            34407
2         83           0           139154
3          1           0             7162
7         15           0           168819
11          5           0             3466
324          1           0             6799
409         11           0            30641
671       1281          24        113156655
722          3           0            30128
723          2           0             8944
727         12           0            35688
728          2           0            11927
834         23           0            62077
869         11           0           170978
963         98           3         15182854
971        148           1           971766
992        137           1          1821399
995         48           0          2676763
999        393           0          3993366
11580          3           0           135687
11581         53           0          1156413
12281        544           1          6954629
12282         36           0          1006592
12297        140           1          3702826
Total Process Time:             2310364749
End of reports

APIs are provided to allow applications to start this measurement and to obtain the total amount of time spent in the process being measured. APIs allow an application to be instrumented to measure the amount of time spent between specific operations or positions within the application. On Intel 32-bit applications, measurements such as instructions retired can be used as a performance counter measurement.

Trace Profiling

Trace profiling works by hooking the kernel timer interrupt. When the timer interrupt is processed, the profiling code creates a trace record that contains the address where the processor was executing when the timer interrupt occurred. This trace record is then written into a trace buffer. When the profiling is complete, these trace records can be dumped out to a file where a post-processing program produces various reports.

You can run a trace profile to identify hot spots within the system by identifying the applications that consume the most time and which functions within those applications are causing them to use all that time. This includes jitted methods with Java applications.

Use the run.tprof command to perform the trace and produce the default reports.

By looking at the run.tprof command file, you can see the specific steps you need to follow to take a profile trace and produce the reports.

The following example shows a sample trace report for a small C application.


# run.tprof
ProcessorSpeed 398000000
TraceCycles 25382010763
TraceTime 63.774(sec)
TOTAL TICKS 25383
(Clipping Level : 0.1 % 25 Ticks)
================================
TPROF Report Summary
)) Process
)) Process_Module
)) Process_Module_Symbol
)) Process_Thread
)) Process_Thread_Module
)) Process_Thread_Module_Symbol
)) Module
)) Module_Symbol
================================
================================
)) Process
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
================================
)) Process_Module
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
MOD 19094 75.22 vmlinux
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
MOD 6284 24.76 /space/piperf/bin/WatchedPot
================================
)) Process_Module_Symbol
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
MOD 19094 75.22 vmlinux
SYM 19094 75.22 default_idle
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
MOD 6284 24.76 /space/piperf/bin/WatchedPot
SYM 3154 12.43 ThirtyTwo
SYM 1615 6.36 Sixteen
SYM 808 3.18 Eight
SYM 404 1.59 Four
SYM 202 0.80 Two
SYM 101 0.40 One
================================
)) Process_Thread
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
TID 19094 75.22 tid_0000
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
TID 6284 24.76 tid_057c
================================
)) Process_Thread_Module
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
TID 19094 75.22 tid_0000
MOD 19094 75.22 vmlinux
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
TID 6284 24.76 tid_057c
MOD 6284 24.76 /space/piperf/bin/WatchedPot
================================
)) Process_Thread_Module_Symbol
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
TID 19094 75.22 tid_0000
MOD 19094 75.22 vmlinux
SYM 19094 75.22 default_idle
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
TID 6284 24.76 tid_057c
MOD 6284 24.76 /space/piperf/bin/WatchedPot
SYM 3154 12.43 ThirtyTwo
SYM 1615 6.36 Sixteen
SYM 808 3.18 Eight
SYM 404 1.59 Four
SYM 202 0.80 Two
SYM 101 0.40 One
================================
)) Module
================================
LAB TKS %%% NAMES
MOD 19096 75.23 vmlinux
MOD 6284 24.76 /space/piperf/bin/WatchedPot
================================
)) Module_Symbol
================================
LAB TKS %%% NAMES
MOD 19096 75.23 vmlinux
SYM 19094 75.22 default_idle
MOD 6284 24.76 /space/piperf/bin/WatchedPot
SYM 3154 12.43 ThirtyTwo
SYM 1615 6.36 Sixteen
SYM 808 3.18 Eight
SYM 404 1.59 Four
SYM 202 0.80 Two
SYM 101 0.40 One

When profiling applications, the level of optimization used when the application is compiling can have an effect on the trace output. With optimization on, functions that execute the same base code are all rolled up into the application instead of reported on separately by symbol name. Therefore, it is a good idea to make the first few profiling runs on applications that have not been compiled with optimization turned on.

Instruction Tracing

A special version of trace profiling that sets up the system to use the hardware's instruction trace capabilities is available on Intel IA32 systems. Instruction tracing starts from the branch instruction signal and then creates trace records every time a code branch, jump, or call is performed. Instruction profiling provides a detailed trace of the execution path in the system without the need to have a debugger installed or code compiled with the debug option enabled.

Use instruction tracing only for very short, controlled measurement periods, because it can produce a significant number of tracing records. Instruction tracing is primarily useful for determining the exact execution path through the entire system for the function being executed.

A small command file, run.itrace, is provided to simplify the taking of an instruction trace profile.

The following example shows a small output of run.itrace.


# run.itrace
-----------------------------
ss: 0x8049a5c 1 push ebp
ss: 0x8049a5d 2 mov ebp, esp
ss: 0x8049a5f 3 sub esp, 0x8
ss: 0x8049a62 5 call 0x420628d8
0 1 1 4 @ 0 func1:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
ss: 0x8049a6a 1 push ebp
ss: 0x8049a6b 2 mov ebp, esp
ss: 0x8049a6d 3 sub esp, 0x8
ss: 0x8049a70 5 call 0x420628d8
0 1 1 4 @ 0 func2:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
ss: 0x8049a78 1 push ebp
ss: 0x8049a79 2 mov ebp, esp
ss: 0x8049a7b 3 sub esp, 0x8
ss: 0x8049a7e 5 call 0x420628d8
0 1 1 4 @ 0 func3:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
ss: 0x8049a86 1 push ebp
ss: 0x8049a87 2 mov ebp, esp
ss: 0x8049a89 3 sub esp, 0x8
ss: 0x8049a8c 5 call 0x420628d8
0 1 1 4 @ 0 func4:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
ss: 0x8049a94 1 push ebp
ss: 0x8049a95 2 mov ebp, esp
ss: 0x8049a97 3 sub esp, 0x8
ss: 0x8049a9a 5 call 0x420628d8
0 1 1 4 @ 0 func5:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
----------------------------------

This report shows the assembler instructions that were executed between the entry and the branch instruction. The report provides a detailed view of what is going on in the system. To relate this back to the actual C code, you would have to compile the C code with the assembler option and calculate the offsets to the assembler instructions to find them in the listing.

If you add the -c option to the post-processing report generator, the output would look similar to the following.


# run.itrace -c
cpu ring instruction_count offset symbolname_modulename pid_tid
_pidname To_instruction
0 1 2 4 @ 32 printf:/lib/i686/libc-2.2.93.so 8a1_0_
/piperf/bin/itracec_ RETURN
0 1 1 5 @ 54 main:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func1:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func2:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func3:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func4:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func5:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 6 @ 0 func6:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 3 1 @ 50 <plt>:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ JUMP
0 1 1 6 @ 0 printf:/lib/i686/libc-2.2.93.so 8a1_0_
/piperf/bin/itracec_ CALL
0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ RETURN
0 1 1 8 @ 11 printf:/lib/i686/libc-2.2.93.so
 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 7 @ 0 _IO_vfprintf_internal:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ CALL
0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ RETURN
0 1 1 6 @ d _IO_vfprintf_internal:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ CALL
0 1 3 1 @ 470 <plt>:/lib/i686/libc-2.2.93.so
 8a1_0_/piperf/bin/itracec_ JUMP
0 1 1 4 @ 0 __errno_location:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ CALL
0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ RETURN
0 1 2 5 @ 9 __errno_location:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ RETURN
----------------------------------

Java Profiler

JPROF is a Java profiling agent that dynamically responds to JVMPI events based on options passed at Java invocation. The profiler is generally referred to as JPROF but uses an executable library called libjprof.so. JPROF provides JIT address-to-name resolution to support tprof and ITtrace data reduction. Other functions include Java Lock Monitor and Java HeapDump. The profiler has been implemented for IBM JDK 1.2.2 and later and is based on the support in the JDK for the JVMPI interface.

The following example shows a report generated by the trace profiler on a Java application.


Tprof Reports
ProcessorSpeed       688000000
TraceCycles     10949059046
TraceTime          15.914(sec)
TOTAL TICKS         125160
(Clipping Level :   0.1 %  125 Ticks)
================================
TPROF Report Summary
)) Process
)) Process_Module
)) Process_Module_Symbol
)) Process_Thread
)) Process_Thread_Module
)) Process_Thread_Module_Symbol
)) Module
)) Module_Symbol
================================
================================
)) Process
================================
LAB    TKS   %%%     NAMES
PID  15645 12.50    SystemProcess_7
PID  15645 12.50    SystemProcess_1
PID  15645 12.50    SystemProcess_0
PID  14388 11.50    SystemProcess_2
PID  14074 11.24    SystemProcess_4
PID  12806 10.23    SystemProcess_3
PID  10935  8.74    SystemProcess_6
PID  10462  8.36    SystemProcess_5
PID   4710  3.76    java_6
PID   4302  3.44    sshd_5
PID   2823  2.26    sshd_3
PID   1566  1.25    java_4
PID   1257  1.00    java_2
PID    881  0.70    java_5
================================
)) Process_Module
================================
LAB    TKS   %%%     NAMES
PID  15645 12.50    SystemProcess_7
MOD  15645 12.50     vmlinux
PID  15645 12.50    SystemProcess_1
MOD  15645 12.50     vmlinux
PID  15645 12.50    SystemProcess_0
MOD  15645 12.50     vmlinux
PID  14388 11.50    SystemProcess_2
MOD  14388 11.50     vmlinux
PID  14074 11.24    SystemProcess_4
MOD  14074 11.24     vmlinux
PID  12806 10.23    SystemProcess_3
MOD  12806 10.23     vmlinux
PID  10935  8.74    SystemProcess_6
MOD  10935  8.74     vmlinux
PID  10462  8.36    SystemProcess_5
MOD  10462  8.36     vmlinux
PID   4710  3.76    java_6
MOD   2109  1.69     vmlinux
MOD   1767  1.41     JITCODE
MOD    268  0.21     /opt/IBMJava2-141/jre/bin/libhpi.so
MOD    243  0.19     /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD    175  0.14     /lib/tls/libpthread-0.29.so
PID   4302  3.44    sshd_5
MOD   1966  1.57     vmlinux
MOD   1349  1.08     /lib/libcrypto.so.0.9.7a
MOD    480  0.38     /usr/sbin/sshd
MOD    462  0.37     /lib/tls/libc-2.3.2.so
PID   2823  2.26    sshd_3
MOD   1369  1.09     vmlinux
MOD    844  0.67     /lib/libcrypto.so.0.9.7a
MOD    302  0.24     /usr/sbin/sshd
MOD    280  0.22     /lib/tls/libc-2.3.2.so
PID   1566  1.25    java_4
MOD    767  0.61     vmlinux
MOD    532  0.43     JITCODE
PID   1257  1.00    java_2
MOD    601  0.48     vmlinux
MOD    434  0.35     JITCODE
PID    881  0.70    java_5
MOD    444  0.35     vmlinux
MOD    299  0.24     JITCODE
================================
)) Process_Module_Symbol
================================
LAB    TKS   %%%     NAMES
PID  15645 12.50    SystemProcess_7
MOD  15645 12.50     vmlinux
SYM  15605 12.47      default_idle
PID  15645 12.50    SystemProcess_1
MOD  15645 12.50     vmlinux
SYM  15606 12.47      default_idle
PID  15645 12.50    SystemProcess_0
MOD  15645 12.50     vmlinux
SYM  15608 12.47      default_idle
PID  14388 11.50    SystemProcess_2
MOD  14388 11.50     vmlinux
SYM  14341 11.46      default_idle
PID  14074 11.24    SystemProcess_4
MOD  14074 11.24     vmlinux
SYM  14043 11.22      default_idle
PID  12806 10.23    SystemProcess_3
MOD  12806 10.23     vmlinux
SYM  12675 10.13      default_idle
PID  10935  8.74    SystemProcess_6
MOD  10935  8.74     vmlinux
SYM  10913  8.72      default_idle
PID  10462  8.36    SystemProcess_5
MOD  10462  8.36     vmlinux
SYM  10387  8.30      default_idle
PID   4710  3.76    java_6
MOD   2109  1.69     vmlinux
SYM    404  0.32      n_tty_receive_buf
SYM    144  0.12      opost_block
SYM    141  0.11      sigprocmask
SYM    140  0.11      .text.lock.tty_io
SYM    136  0.11      tty_write
SYM    130  0.10      n_tty_receive_room
MOD   1767  1.41     JITCODE
SYM    621  0.50      hellop.main([Ljava/lang/String;)V
SYM    143  0.11      java/io/OutputStreamWriter.write([CII)V
MOD    268  0.21     /opt/IBMJava2-141/jre/bin/libhpi.so
MOD    243  0.19     /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD    175  0.14     /lib/tls/libpthread-0.29.so
PID   4302  3.44    sshd_5
MOD   1966  1.57     vmlinux
SYM    236  0.19      read_chan
SYM    125  0.10      n_tty_chars_in_buffer
MOD   1349  1.08     /lib/libcrypto.so.0.9.7a
SYM   1345  1.07      NoSymbols
MOD    480  0.38     /usr/sbin/sshd
SYM    474  0.38      NoSymbols
MOD    462  0.37     /lib/tls/libc-2.3.2.so
PID   2823  2.26    sshd_3
MOD   1369  1.09     vmlinux
SYM    175  0.14      read_chan
MOD    844  0.67     /lib/libcrypto.so.0.9.7a
SYM    842  0.67      NoSymbols
MOD    302  0.24     /usr/sbin/sshd
SYM    298  0.24      NoSymbols
MOD    280  0.22     /lib/tls/libc-2.3.2.so
PID   1566  1.25    java_4
MOD    767  0.61     vmlinux
SYM    151  0.12      n_tty_receive_buf
MOD    532  0.43     JITCODE
SYM    161  0.13      hellop.main([Ljava/lang/String;)V
PID   1257  1.00    java_2
MOD    601  0.48     vmlinux
SYM    132  0.11      n_tty_receive_buf
MOD    434  0.35     JITCODE
SYM    126  0.10      hellop.main([Ljava/lang/String;)V
PID    881  0.70    java_
MOD    444  0.35     vmlinux
MOD    299  0.24     JITCODE
================================
)) Process_Thread
================================
LAB    TKS   %%%     NAMES
PID  15645 12.50    SystemProcess_7
TID  15645 12.50     tid_0000
PID  15645 12.50    SystemProcess_1
TID  15645 12.50     tid_0000
PID  15645 12.50    SystemProcess_0
TID  15645 12.50     tid_0000
PID  14388 11.50    SystemProcess_2
TID  14388 11.50     tid_0000
PID  14074 11.24    SystemProcess_4
TID  14074 11.24     tid_0000
PID  12806 10.23    SystemProcess_3
TID  12806 10.23     tid_0000
PID  10935  8.74    SystemProcess_6
TID  10935  8.74     tid_0000
PID  10462  8.36    SystemProcess_5
TID  10462  8.36     tid_0000
PID   4710  3.76    java_6
TID   4668  3.73     tid_main
PID   4302  3.44    sshd_5
TID   4298  3.43     tid_02c5
PID   2823  2.26    sshd_3
TID   2823  2.26     tid_02c5
PID   1566  1.25    java_4
TID   1527  1.22     tid_main
PID   1257  1.00    java_2
TID   1222  0.98     tid_main
PID    881  0.70    java_5
TID    832  0.66     tid_main
================================
)) Process_Thread_Module
================================
LAB    TKS   %%%     NAMES
PID  15645 12.50    SystemProcess_7
TID  15645 12.50     tid_0000
MOD  15645 12.50      vmlinux
PID  15645 12.50    SystemProcess_1
TID  15645 12.50     tid_0000
MOD  15645 12.50      vmlinux
PID  15645 12.50    SystemProcess_0
TID  15645 12.50     tid_0000
MOD  15645 12.50      vmlinux
PID  14388 11.50    SystemProcess_2
TID  14388 11.50     tid_0000
MOD  14388 11.50      vmlinux
PID  14074 11.24    SystemProcess_4
TID  14074 11.24     tid_0000
MOD  14074 11.24      vmlinux
PID  12806 10.23    SystemProcess_3
TID  12806 10.23     tid_0000
MOD  12806 10.23      vmlinux
PID  10935  8.74    SystemProcess_6
TID  10935  8.74     tid_0000
MOD  10935  8.74      vmlinux
PID  10462  8.36    SystemProcess_5
TID  10462  8.36     tid_0000
MOD  10462  8.36      vmlinux
PID   4710  3.76    java_6
TID   4668  3.73     tid_main
MOD   2075  1.66      vmlinux
MOD   1767  1.41      JITCODE
MOD    268  0.21      /opt/IBMJava2-141/jre/bin/libhpi.so
MOD    236  0.19      /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD    174  0.14      /lib/tls/libpthread-0.29.so
PID   4302  3.44    sshd_5
TID   4298  3.43     tid_02c5
MOD   1965  1.57      vmlinux
MOD   1349  1.08      /lib/libcrypto.so.0.9.7a
MOD    478  0.38      /usr/sbin/sshd
MOD    461  0.37      /lib/tls/libc-2.3.2.so
PID   2823  2.26    sshd_3
TID   2823  2.26     tid_02c5
MOD   1369  1.09      vmlinux
MOD    844  0.67      /lib/libcrypto.so.0.9.7a
MOD    302  0.24      /usr/sbin/sshd
MOD    280  0.22      /lib/tls/libc-2.3.2.so
PID   1566  1.25    java_4
TID   1527  1.22     tid_main
MOD    734  0.59      vmlinux
MOD    532  0.43      JITCODE
PID   1257  1.00    java_2
TID   1222  0.98     tid_main
MOD    580  0.46      vmlinux
MOD    434  0.35      JITCODE
PID    881  0.70    java_5
TID    832  0.66     tid_main
MOD    410  0.33      vmlinux
MOD    299  0.24      JITCODE
================================
)) Process_Thread_Module_Symbol
================================
LAB    TKS   %%%     NAMES
PID  15645 12.50    SystemProcess_7
TID  15645 12.50     tid_0000
MOD  15645 12.50      vmlinux
SYM  15605 12.47       default_idle
PID  15645 12.50    SystemProcess_1
TID  15645 12.50     tid_0000
MOD  15645 12.50      vmlinux
SYM  15606 12.47       default_idle
PID  15645 12.50    SystemProcess_0
TID  15645 12.50     tid_0000
MOD  15645 12.50      vmlinux
SYM  15608 12.47       default_idle
PID  14388 11.50    SystemProcess_2
TID  14388 11.50     tid_0000
MOD  14388 11.50      vmlinux
SYM  14341 11.46       default_idle
PID  14074 11.24    SystemProcess_4
TID  14074 11.24     tid_0000
MOD  14074 11.24      vmlinux
SYM  14043 11.22       default_idle
PID  12806 10.23    SystemProcess_3
TID  12806 10.23     tid_0000
MOD  12806 10.23      vmlinux
SYM  12675 10.13       default_idle
PID  10935  8.74    SystemProcess_6
TID  10935  8.74     tid_0000
MOD  10935  8.74      vmlinux
SYM  10913  8.72       default_idle
PID  10462  8.36    SystemProcess_5
TID  10462  8.36     tid_0000
MOD  10462  8.36      vmlinux
SYM  10387  8.30       default_idle
PID   4710  3.76    java_6
TID   4668  3.73     tid_main
MOD   2075  1.66      vmlinux
SYM    404  0.32       n_tty_receive_buf
SYM    144  0.12       opost_block
SYM    141  0.11       sigprocmask
SYM    140  0.11       .text.lock.tty_io
SYM    136  0.11       tty_write
SYM    130  0.10       n_tty_receive_room
MOD   1767  1.41      JITCODE
SYM    621  0.50       hellop.main([Ljava/lang/String;)V
SYM    143  0.11       java/io/OutputStreamWriter.write([CII)V
MOD    268  0.21      /opt/IBMJava2-141/jre/bin/libhpi.so
MOD    236  0.19      /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD    174  0.14      /lib/tls/libpthread-0.29.so
PID   4302  3.44    sshd_5
TID   4298  3.43     tid_02c5
MOD   1965  1.57      vmlinux
SYM    236  0.19       read_chan
SYM    125  0.10       n_tty_chars_in_buffer
MOD   1349  1.08      /lib/libcrypto.so.0.9.7a
SYM   1345  1.07       NoSymbols
MOD    478  0.38      /usr/sbin/sshd
SYM    472  0.38       NoSymbols
MOD    461  0.37      /lib/tls/libc-2.3.2.so
PID   2823  2.26    sshd_3
TID   2823  2.26     tid_02c5
MOD   1369  1.09      vmlinux
SYM    175  0.14       read_chan
MOD    844  0.67      /lib/libcrypto.so.0.9.7a
SYM    842  0.67       NoSymbols
MOD    302  0.24      /usr/sbin/sshd
SYM    298  0.24       NoSymbols
MOD    280  0.22      /lib/tls/libc-2.3.2.so
PID   1566  1.25    java_4
TID   1527  1.22     tid_main
MOD    734  0.59      vmlinux
SYM    151  0.12       n_tty_receive_buf
MOD    532  0.43      JITCODE
SYM    161  0.13       hellop.main([Ljava/lang/String;)V
PID   1257  1.00    java_2
TID   1222  0.98     tid_main
MOD    580  0.46      vmlinux
SYM    132  0.11       n_tty_receive_buf
MOD    434  0.35      JITCODE
SYM    126  0.10       hellop.main([Ljava/lang/String;)V
PID    881  0.70    java_5
TID    832  0.66     tid_main
MOD    410  0.33      vmlinux
MOD    299  0.24      JITCODE
================================
)) Module
================================
LAB    TKS   %%%     NAMES
MOD 116871 93.38    vmlinux
MOD   3032  2.42    JITCODE
MOD   2193  1.75    /lib/libcrypto.so.0.9.7a
MOD    782  0.62    /usr/sbin/sshd
MOD    743  0.59    /lib/tls/libc-2.3.2.so
MOD    443  0.35    /opt/IBMJava2-141/jre/bin/libhpi.so
MOD    441  0.35    /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD    311  0.25    /lib/tls/libpthread-0.29.so
MOD    208  0.17    NoModule
MOD    133  0.11    /opt/IBMJava2-141/jre/bin/libjava.so
================================
)) Module_Symbol
================================
LAB    TKS   %%%     NAMES
MOD 116871 93.38    vmlinux
SYM 109178 87.23     default_idle
SYM    783  0.63     n_tty_receive_buf
SYM    413  0.33     .text.lock.tty_io
SYM    411  0.33     read_chan
SYM    378  0.30     sigprocmask
SYM    331  0.26     __wake_up
SYM    310  0.25     sysenter_past_esp
SYM    299  0.24     schedule
SYM    271  0.22     tty_write
SYM    256  0.20     opost_block
SYM    246  0.20     add_wait_queue
SYM    229  0.18     n_tty_receive_room
SYM    229  0.18     n_tty_chars_in_buffer
SYM    197  0.16     __copy_from_user_ll
SYM    191  0.15     sys_rt_sigprocmask
SYM    177  0.14     do_select
SYM    174  0.14     __copy_to_user_ll
SYM    170  0.14     remove_wait_queue
SYM    154  0.12     write_chan
SYM    153  0.12     pty_write
SYM    139  0.11     sys_select
SYM    125  0.10     kill_fasync
MOD   3032  2.42    JITCODE
SYM   1011  0.81     hellop.main([Ljava/lang/String;)V
SYM    254  0.20     java/io/OutputStreamWriter.write([CII)V
SYM    198  0.16     sun/nio/cs/StreamEncoder.flushBuffer()V
SYM    168  0.13     java/lang/String.<init>(Ljava/lang/String;I)V
SYM    168  0.13     java/io/PrintStream.write([BII)V
SYM    132  0.11     java/io/PrintStream.write(Ljava/lang/String;)V
MOD   2193  1.75    /lib/libcrypto.so.0.9.7a
SYM   2187  1.75     NoSymbols
MOD    782  0.62    /usr/sbin/sshd
SYM    772  0.62     NoSymbols
MOD    743  0.59    /lib/tls/libc-2.3.2.so
SYM    152  0.12     memset
SYM    125  0.10     __memchr
MOD    443  0.35    /opt/IBMJava2-141/jre/bin/libhpi.so
MOD    441  0.35    /opt/IBMJava2-141/jre/bin/classic/libjvm.so
SYM    141  0.11     jitCacheAlloc
MOD    311  0.25    /lib/tls/libpthread-0.29.so
MOD    208  0.17    NoModule
SYM    208  0.17     NoSymbols
MOD    133  0.11    /opt/IBMJava2-141/jre/bin/libjava.so
Offset Report
================================
)) Module_Symbol_Offset
================================
LAB    TKS   %%%     NAMES
MOD 116871 93.38    vmlinux
SYM 109178 87.23     default_idle
OFF 109174 87.23      0x2d
SYM    783  0.63     n_tty_receive_buf
OFF    440  0.35      0x157
SYM    413  0.33     .text.lock.tty_io
OFF    204  0.16      0xb1
SYM    411  0.33     read_chan
SYM    378  0.30     sigprocmask
OFF    321  0.26      0x77
SYM    331  0.26     __wake_up
OFF    326  0.26      0x37
SYM    310  0.25     sysenter_past_esp
OFF    127  0.10      0x3
SYM    299  0.24     schedule
OFF    277  0.22      0x22d
SYM    271  0.22     tty_write
SYM    256  0.20     opost_block
SYM    246  0.20     add_wait_queue
OFF    212  0.17      0x37
SYM    229  0.18     n_tty_receive_room
SYM    229  0.18     n_tty_chars_in_buffer
OFF    212  0.17      0x33
SYM    197  0.16     __copy_from_user_ll
SYM    191  0.15     sys_rt_sigprocmask
SYM    177  0.14     do_select
SYM    174  0.14     __copy_to_user_ll
SYM    170  0.14     remove_wait_queue
OFF    147  0.12      0x1d
SYM    154  0.12     write_chan
SYM    153  0.12     pty_write
SYM    139  0.11     sys_select
SYM    125  0.10     kill_fasync
MOD   3032  2.42    JITCODE
SYM   1011  0.81     hellop.main([Ljava/lang/String;)V
OFF    405  0.32      0x1e0
OFF    257  0.21      0x1e3
OFF    192  0.15      0x1e4
SYM    254  0.20     java/io/OutputStreamWriter.write([CII)V
SYM    198  0.16     sun/nio/cs/StreamEncoder.flushBuffer()V
SYM    168  0.13     java/lang/String.<init>(Ljava/lang/String;I)V
SYM    168  0.13     java/io/PrintStream.write([BII)V
SYM    132  0.11     java/io/PrintStream.write(Ljava/lang/String;)V
MOD   2193  1.75    /lib/libcrypto.so.0.9.7a
SYM   2187  1.75     NoSymbols
OFF    141  0.11      0x2bcf3
MOD    782  0.62    /usr/sbin/sshd
SYM    772  0.62     NoSymbols
MOD    743  0.59    /lib/tls/libc-2.3.2.so
SYM    152  0.12     memset
SYM    125  0.10     __memchr
MOD    443  0.35    /opt/IBMJava2-141/jre/bin/libhpi.so
MOD    441  0.35    /opt/IBMJava2-141/jre/bin/classic/libjvm.so
SYM    141  0.11     jitCacheAlloc
MOD    311  0.25    /lib/tls/libpthread-0.29.so
MOD    208  0.17    NoModule
SYM    208  0.17     NoSymbols
OFF    134  0.11      0xffffe410
MOD    133  0.11    /opt/IBMJava2-141/jre/bin/libjava.so
</pre>

Java Lock Monitor

Java Lock Monitor (JLM) support is provided with version 1.4.0 of the IBM JDK. JLM provides monitor hold time accounting and contention statistics on monitors used in Java applications and the JVM itself. JLM provides support for the following counters.

Counters associated with contended locks:

Total number of successful acquires.

Recursive acquires. Number of times the monitor was requested and was already owned by the requesting thread.

Number of times the requesting thread was blocked waiting on the monitor because the monitor was already owned by another thread.

Cumulative time the monitor was held.

The following statistics are also collected on platforms that support 3-Tier Spin Locking (x86 SMP):

Number of times the requesting thread went through the inner (spin) loop while attempting to acquire the monitor.

Number of times the requesting thread went through the outer (thread yield) loop while attempting acquire the monitor.

Garbage collection (GC) time is removed from hold times for all monitors held across a GC cycle.

A monitor can be acquired either recursively, when the requesting thread already owns it, or nonrecursively, when the requesting thread does not already own it. Non-recursive acquires can be further divided into fast and slow. Fast is when the requested monitor is not already owned and the requesting thread gains ownership immediately. On platforms that implement 3-Tier Spin Locking, any monitor acquired while spinning is considered a fast acquire, regardless of the number of iterations in each tier. Slow is when the requested monitor is already owned by another thread and the requesting thread is blocked.[View full width]

Java Lock Monitor Report
Version_4.26 (05.01.2002)
Built : ( Wed May 1 14:44:28 CDT 2002 )
JLM_Interval_Time 34021158156
System (Registered) Monitors
%MISS GETS NONREC SLOW REC TIER2 TIER3 %UTIL AVER-HTM MON-NAME
87 5273 5273 4572 0 710708 18487 1 95408 ITC Global_Compile lock
9 6870 6869 631 1 113420 2976 0 11807 Heap lock
5 1123 1123 51 0 11098 286 1 248385 Binclass lock
0 1153 1147 5 6 1307 33 0 47974 Monitor Cache lock
0 46149 45877 134 272 36961 877 1 6558 ITC CHA lock
0 33734 23483 19 10251 6544 150 1 17083 Thread queue lock
0 5 5 0 0 0 0 0 9309689 JNI Global Reference lock
0 5 5 0 0 0 0 0 9283000 JNI Pinning lock
0 5 5 0 0 0 0 0 9442968 Sleep lock
0 1 1 0 0 0 0 0 0 Monitor Registry lock
0 0 0 0 0 0 0 0 0 Evacuation Region lock
0 0 0 0 0 0 0 0 0 Method trace lock
0 0 0 0 0 0 0 0 0 Classloader lock
0 0 0 0 0 0 0 0 0 Heap Promotion lock
Java (Inflated) Monitors
%MISS GETS NONREC SLOW REC TIER2 TIER3 %UTIL AVER-HTM MON-NAME
15 68 68 10 0 2204 56 2 11936405 test.lock.testlock1@A09410
/A09418
2 42 42 1 0 186 5 0 300478 test.lock.testlock2@D31358
/D31360
0 70 70 0 0 41 1 0 7617 Java.lang.ref
.ReferenceQueue$Lock@920628/920630
LEGEND:
%MISS : 100 * SLOW / NONREC
GETS : Lock Entries
NONREC : Non Recursive Gets
SLOW : Non Recursives that Wait
REC : Recursive Gets
TIER2 : SMP Wait Hierarchy
TIER3 : SMP Wait Hierarchy
%UTIL : 100 * Hold-Time / Total-Time
AVER-HT : Hold-Time / NONREC

Descriptions of the report's fields are as follows:

JLM_Interval_Time.
Time interval between the start and end of measurement. Time is expressed in the units appropriate for the hardware platform: cycles for x86, IA64, and S390, and time-based ticks for PPC.

%MISS.
Percentage of the total GETS (acquires) where the requesting thread was blocked waiting on the monitor.


%MISS = (SLOW / NONREC) * 100

GETS.
Total number of successful acquires.


GETS = FAST + SLOW + REC

NONREC.
Total number of nonrecursive acquires. This number includes SLOW gets.

SLOW.
Total number of nonrecursive acquires which caused the requesting thread to block waiting for the monitor to no longer be owned. This number is included in NONREC.

To calculate the number of nonrecursive acquires in which the requesting thread obtained ownership immediately (FAST), subtract SLOW from NONREC. On platforms that support 3-Tier Spin Locking, monitors acquired while spinning are considered FAST acquires.

REC.
Total number of recursive acquires. A recursive acquire is one where the requesting thread already owns the monitor.

TIER2.
Total number of Tier 2 (inner spin loop) iterations on platforms that support 3-Tier Spin Locking.

TIER3.
Total number of Tier 3 (outer thread yield loop) iterations on platforms that support 3-Tier Spin Locking.

%UTIL.
Monitor hold time divided by JLM_Interval_Time. Hold time accounting must be turned on.

AVER-HTM.
Average amount of time the monitor was held. Recursive acquires are not included because the monitor is already owned when acquired recursively.


Total hold time / NONREC

MON-NAME.
Monitor name or NULL (blank) if the name is not known.

Performance Inspector Executable Tools

The following tools, which are shipped with the Performance Inspector, provide support for the PI functions.

swtrace

swtrace is a software tracing mechanism that runs on Linux. swtrace is normally run from a command prompt by issuing the swtrace command with the appropriate arguments.

swtrace uses software trace hooks to collect data. Trace hooks are identified by both a major code and a minor code. Trace data is collected to a trace buffer that is allocated when swtrace is initialized or turned on. The size of the trace buffer can be set when swtrace is initialized. The swtrace command allows the user to select which major codes are traced, when tracing starts, when tracing stops, when data is transferred from the trace buffer to disk, and formatting of the trace data.

The major parameters supported by swtrace are as follows:

init.
Tells the trace profiler to allocate the trace buffers and initialize the system for tracing. With init, the size of the trace buffer to be allocated and the performance counter to use for taking the trace can be optionally specified.

enable.
Enables the trace hooks within the Linux kernel. This controls what information is placed in the trace buffer.

disable.
Keeps the specified trace hooks from being measured.

on.
Starts the trace. The trace information is gathered until the swtrace off command is given.

get.
Dumps the contents of the trace buffers to a file for processing by the report generator program.

it_install.
Initializes the instruction trace facility.

it_remove.
Resets the instruction trace functionality.

Other parameters display information about the Performance Inspector and control the rate of profiling. The command file run.tprof is generated when PI is installed in the system. This command file contains all the steps necessary to take a profiling trace and produce a report.

post

post produces various reports based on the trace profiling data. When a trace profiling report is produced, it is written to a file called tprof.out. When you install the Performance Inspector, you identify the directory where this file will be saved. If you want to keep the current tprof.out file, you must rename it something else before running another run.tprof command.

One option supported by the post command is -show. -show creates a file called post.show, which is a dump in a readable format of all the trace records. When all the trace hooks are enabled, post.show gives a detailed look at the sequence of events that occurred in the system, from execs to dispatches to interrupts.

pipcntr

pipcntr controls and displays the performance counters when PI is running on Intel IA32 platforms. This program can start and stop counters and display the contents of the counter registers and the counter control register settings. You can use this utility to start a performance counter and then run the per-thread time utility using this counter. The same holds true for the trace profiling function.

ptt

ptt starts and stops the per-thread time measurement. When starting, you can also specify what metric to use to perform the measurement.

pttstats

pttstats displays the per-thread time of every process in the system. On Intel IA32 systems, the measurement metric can be either the real-time clock or a performance monitor counter.

Performance Tuning for Linux Servers [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی