Performance Tuning for Linux Servers [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Performance Tuning for Linux Servers [Electronic resources] - نسخه متنی

Sandra K. Johnson

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید

Performance Inspector


The Performance Inspector (PI) is a suite of tools that identifies performance problems and characteristics. PI is distributed with a kernel patch and the source to build the device driver and tools. Procedures are also included to automate the install process and build for installing the tools. Support is provided for Intel 32- and 64-bit platforms, Power PC 32- and 64-bit systems, S390 32- and 64-bit systems, and the AMD Hammer processor.

The tools included in all versions of Performance Inspector include the following:

Above Idle.
Shows how idle, process, and interrupt times are distributed over the processor(s) on the running system. Above Idle is a phase 1 tool that identifies hot spots with respect to processor time and interrupt time spent in the system. This tool can also be used on SMP systems to see how well the multiple active processes are being spread over multiple CPUs.

Per-Thread Time.
Hooks into the process dispatch and interrupt code to maintain information on the total amount of time spent within a process, the time spent handling interrupts, and the amount of idle time. Summary information is provided on a per-CPU basis and for the system as a whole. APIs are provided to allow a developer to include calls within this application to measure functions and code snippets.

System Trace Data Collection.
Trace hooks are added to the kernel to collect data on what is happening within the system. Hooks are provided for process dispatches, process timeslices, interrupt entry and exits, process forks and clones, process startup execution, system timer interrupts, and memory mapping of code segments. It is also possible for an application to write its own trace records.

Command files, libraries, and post processors are included to allow automatic tracing functions and reports. The most significant of these functions allow the measurement and reporting of jitted methods within Java applications. This function is very useful in identifying what methods are running and how much time they are using.

JPROF.
Performs Java execution profiling. This tool includes the capability to obtain detailed information on jitted methods. To use this tool, use a version of the Java SDK that includes JVMPI support.

Java Lock Monitor (JLM).
Reports hold-time accounting and contention statistics with Java applications. The IBM Java 1.4 SDK is required to use this tool.

In addition to these common measurement functions, an additional set of functionality is provided on Intel 32-bit systems. This functionality includes the following:

Performance counter support.
Supports starting and stopping the counter, reporting counter contents, and displaying counter settings. The Trace Data Collection and Per-Thread Time functions base their measurements on the performance counters instead of the system clock. Instead of seeing how much time is spent within a process, you see how many instructions are executed, or how many branches or jumps are performed.

Instruction tracing.
Instruction tracing records all the branches taken while the measurement is active. Branches include calls, jumps, and any other execution path change. A post-processor tool is provided to report an instruction trace, including the number of instructions executed and where in the code the execution occurs. This reporting can also be done for Java jitted methods.

Dynamic kernel patch.
For certain releases of Red Hat and SUSE distributions, a version of the tools is provided that does not require a patch to the Linux kernel. Instead, the device driver dynamically patches the required hooks into the running system and runs the Performance Inspector tool suite.

The Performance Inspector installation requires you to apply a kernel patch and rebuild the kernel. After that step is complete, the various tools are built. Because some of the tools are sensitive to the version of Java that is being used, the tool build process must be redone when a new version of the Java support is used. However, this requirement does not affect the kernel. The kernel needs to be patched and built only once.

An exception to the installation procedure is provided for the Intel 32-bit support. A special version of the Performance Inspector is provided that does not require the kernel to be patched as long as the kernel being used is the default kernel shipped with supported versions of Red Hat and SUSE. The kernel source needs to be installed and the tools built as before, but the kernel does not have to be rebuilt. The PI driver dynamically patches the kernel at the appropriate places. The kernel patches are removed when the driver is uninstalled.

When none of the PI performance probes in the kernel are active, there is minimal impact on system performance. At most, a compare and short branch instruction are added to the code path where the probes are located. For the dynamic version of the IA32 PI, no extra overhead is added when the driver is not loaded.

When performing a trace, there is usually a 2% to 3% overhead on the system. Instruction tracing obviously adds more overhead, but it is for use in a debug environment. N tracing is done in a working production environment without any major impacts to performance or throughput.

Most of the PI functions can be controlled from APIs issued from the user's program as well as from the PI's own command files. When using the API interface, either from C code or Java, you can fine-tune when, where, and what information is collected. All of the source is provided and can be used as a coding sample.

Refer to http://www-124.ibm.com/developerworks/oss/pi/indexl for more information and to obtain a copy of the Performance Inspector.

The remainder of this chapter examines in detail the various features of the Performance Inspector.

Above Idle


Above Idle works by hooking into the process timeslice logic and interrupt handler within the kernel. Above Idle keeps track of the amount of time that is spent while a processor is busy, idle, or handling interrupts. It is useful for identifying processor overload, high interrupt activity, and poor distribution of work within a multiprocessor system. When active, Above Idle gathers information over a user-specified measurement period. The default is 1 second. When using the default, Above Idle determines the amount of time the system spent while idle, active, and processing interrupts. The percentage of processor idle, active, and interrupt processing time is then calculated and displayed by the processor when running on a multiprocessor system. Parameters are used to define how many measurements to make and the time interval between measurements.

To start the Above Idle measurement, enter swtrace ai, which uses the defaults to report system usage every second until the measurement is manually stopped.

The following example shows a sample Above Idle output.


# swtrace ai
Above Idle instrumentation enabled
CPU 0 IDLE= 48.83, BUSY= 49.74, INTR= 1.44
CPU 1 IDLE= 0.00, BUSY= 99.81, INTR= 0.19
CPU 2 IDLE= 96.10, BUSY= 3.70, INTR= 0.20
CPU 3 IDLE= 22.10, BUSY= 77.71, INTR= 0.19
CPU 4 IDLE= 16.50, BUSY= 83.30, INTR= 0.20
CPU 5 IDLE= 16.66, BUSY= 83.10, INTR= 0.25
CPU 6 IDLE= 16.83, BUSY= 82.96, INTR= 0.21
CPU 7 IDLE= 60.72, BUSY= 39.08, INTR= 0.20
CPU 0 IDLE= 98.54, BUSY= 0.00, INTR= 1.46
CPU 1 IDLE= 86.35, BUSY= 13.48, INTR= 0.17
CPU 2 IDLE= 99.78, BUSY= 0.00, INTR= 0.22
CPU 3 IDLE= 99.77, BUSY= 0.00, INTR= 0.23
CPU 4 IDLE= 18.91, BUSY= 80.87, INTR= 0.22
CPU 5 IDLE= 10.75, BUSY= 89.05, INTR= 0.20
CPU 6 IDLE= 99.77, BUSY= 0.00, INTR= 0.23
CPU 7 IDLE= 99.73, BUSY= 0.00, INTR= 0.27
CPU 0 IDLE= 98.59, BUSY= 0.00, INTR= 1.41
CPU 1 IDLE= 99.80, BUSY= 0.00, INTR= 0.20
CPU 2 IDLE= 99.81, BUSY= 0.00, INTR= 0.19
CPU 3 IDLE= 99.79, BUSY= 0.00, INTR= 0.21
CPU 4 IDLE= 1.19, BUSY= 98.65, INTR= 0.16
CPU 5 IDLE= 0.00, BUSY= 99.77, INTR= 0.23
CPU 6 IDLE= 99.78, BUSY= 0.00, INTR= 0.22
CPU 7 IDLE= 99.85, BUSY= 0.00, INTR= 0.15

The example was run on an eight-way SMP machine and shows data collected over 3 seconds.

Per-Thread Time


When active, per-thread time (PTT) accumulates the amount of time spent by each process within the system, as well as idle time and interrupt time. If you are processing on an Intel IA 32-bit system, you can use a performance event counter instead of using the real-time clock. For example, a measurement can be made that identifies the number of instructions executed by each process, the idle process, and the interrupts. The totals for process time, IRQ time, and idle time are provided for each processor in the system, as well as a system total of all processors.

PTT is activated via an API call, PTTInit, or an external program called ptt. In both cases, you specify the measurement medium as system timer or performance counter. After it is activated, you can run a program called pttstats to display the current values for idle, process, and interrupt times. You can run the pttstats program any number of times. A terminate command stops the measurement. The last counts before the terminate command is issued remain stored in the buffers until another init command is received. The following shows a sample PTT report:


Tools version 2040024
PTT_STATISTICS_SIZE = 263096
IBM_NR_CPUS = 16
Per Thread Time Summary
Total Dispatches: 3763
Total IRQs: 334
Total Thread Time: 151469140
Total IRQ Time: 721564
Total Idle Time: 2158895609
Total tool time: 81755
Total Measurement: 2311086313
Total Period: 0
Per Thread Time Detail
Details by CPU
IRQ Dispatch
CPU Count Count Times
----------------------------------------------------
1 334 3763
Thread Time: 151469140
IRQ Time: 721564
Idle Time: 2158895609
Tool Time: 81755
Total Time: 5311168068
Details by Process
Dispatch IRQ Process
PID Count Count Time
--------------------------------------------------
0 708 303 2158895609
1 5 0 34407
2 83 0 139154
3 1 0 7162
7 15 0 168819
11 5 0 3466
324 1 0 6799
409 11 0 30641
671 1281 24 113156655
722 3 0 30128
723 2 0 8944
727 12 0 35688
728 2 0 11927
834 23 0 62077
869 11 0 170978
963 98 3 15182854
971 148 1 971766
992 137 1 1821399
995 48 0 2676763
999 393 0 3993366
11580 3 0 135687
11581 53 0 1156413
12281 544 1 6954629
12282 36 0 1006592
12297 140 1 3702826
Total Process Time: 2310364749
End of reports

APIs are provided to allow applications to start this measurement and to obtain the total amount of time spent in the process being measured. APIs allow an application to be instrumented to measure the amount of time spent between specific operations or positions within the application. On Intel 32-bit applications, measurements such as instructions retired can be used as a performance counter measurement.

Trace Profiling


Trace profiling works by hooking the kernel timer interrupt. When the timer interrupt is processed, the profiling code creates a trace record that contains the address where the processor was executing when the timer interrupt occurred. This trace record is then written into a trace buffer. When the profiling is complete, these trace records can be dumped out to a file where a post-processing program produces various reports.

You can run a trace profile to identify hot spots within the system by identifying the applications that consume the most time and which functions within those applications are causing them to use all that time. This includes jitted methods with Java applications.

Use the run.tprof command to perform the trace and produce the default reports.

By looking at the run.tprof command file, you can see the specific steps you need to follow to take a profile trace and produce the reports.

The following example shows a sample trace report for a small C application.


# run.tprof
ProcessorSpeed 398000000
TraceCycles 25382010763
TraceTime 63.774(sec)
TOTAL TICKS 25383
(Clipping Level : 0.1 % 25 Ticks)
================================
TPROF Report Summary
)) Process
)) Process_Module
)) Process_Module_Symbol
)) Process_Thread
)) Process_Thread_Module
)) Process_Thread_Module_Symbol
)) Module
)) Module_Symbol
================================
================================
)) Process
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
================================
)) Process_Module
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
MOD 19094 75.22 vmlinux
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
MOD 6284 24.76 /space/piperf/bin/WatchedPot
================================
)) Process_Module_Symbol
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
MOD 19094 75.22 vmlinux
SYM 19094 75.22 default_idle
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
MOD 6284 24.76 /space/piperf/bin/WatchedPot
SYM 3154 12.43 ThirtyTwo
SYM 1615 6.36 Sixteen
SYM 808 3.18 Eight
SYM 404 1.59 Four
SYM 202 0.80 Two
SYM 101 0.40 One
================================
)) Process_Thread
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
TID 19094 75.22 tid_0000
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
TID 6284 24.76 tid_057c
================================
)) Process_Thread_Module
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
TID 19094 75.22 tid_0000
MOD 19094 75.22 vmlinux
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
TID 6284 24.76 tid_057c
MOD 6284 24.76 /space/piperf/bin/WatchedPot
================================
)) Process_Thread_Module_Symbol
================================
LAB TKS %%% NAMES
PID 19094 75.22 SystemProcess_0000
TID 19094 75.22 tid_0000
MOD 19094 75.22 vmlinux
SYM 19094 75.22 default_idle
PID 6284 24.76 /space/piperf/bin/WatchedPot_057c
TID 6284 24.76 tid_057c
MOD 6284 24.76 /space/piperf/bin/WatchedPot
SYM 3154 12.43 ThirtyTwo
SYM 1615 6.36 Sixteen
SYM 808 3.18 Eight
SYM 404 1.59 Four
SYM 202 0.80 Two
SYM 101 0.40 One
================================
)) Module
================================
LAB TKS %%% NAMES
MOD 19096 75.23 vmlinux
MOD 6284 24.76 /space/piperf/bin/WatchedPot
================================
)) Module_Symbol
================================
LAB TKS %%% NAMES
MOD 19096 75.23 vmlinux
SYM 19094 75.22 default_idle
MOD 6284 24.76 /space/piperf/bin/WatchedPot
SYM 3154 12.43 ThirtyTwo
SYM 1615 6.36 Sixteen
SYM 808 3.18 Eight
SYM 404 1.59 Four
SYM 202 0.80 Two
SYM 101 0.40 One

When profiling applications, the level of optimization used when the application is compiling can have an effect on the trace output. With optimization on, functions that execute the same base code are all rolled up into the application instead of reported on separately by symbol name. Therefore, it is a good idea to make the first few profiling runs on applications that have not been compiled with optimization turned on.

Instruction Tracing


A special version of trace profiling that sets up the system to use the hardware's instruction trace capabilities is available on Intel IA32 systems. Instruction tracing starts from the branch instruction signal and then creates trace records every time a code branch, jump, or call is performed. Instruction profiling provides a detailed trace of the execution path in the system without the need to have a debugger installed or code compiled with the debug option enabled.

Use instruction tracing only for very short, controlled measurement periods, because it can produce a significant number of tracing records. Instruction tracing is primarily useful for determining the exact execution path through the entire system for the function being executed.

A small command file, run.itrace, is provided to simplify the taking of an instruction trace profile.

The following example shows a small output of run.itrace.


# run.itrace
-----------------------------
ss: 0x8049a5c 1 push ebp
ss: 0x8049a5d 2 mov ebp, esp
ss: 0x8049a5f 3 sub esp, 0x8
ss: 0x8049a62 5 call 0x420628d8
0 1 1 4 @ 0 func1:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
ss: 0x8049a6a 1 push ebp
ss: 0x8049a6b 2 mov ebp, esp
ss: 0x8049a6d 3 sub esp, 0x8
ss: 0x8049a70 5 call 0x420628d8
0 1 1 4 @ 0 func2:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
ss: 0x8049a78 1 push ebp
ss: 0x8049a79 2 mov ebp, esp
ss: 0x8049a7b 3 sub esp, 0x8
ss: 0x8049a7e 5 call 0x420628d8
0 1 1 4 @ 0 func3:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
ss: 0x8049a86 1 push ebp
ss: 0x8049a87 2 mov ebp, esp
ss: 0x8049a89 3 sub esp, 0x8
ss: 0x8049a8c 5 call 0x420628d8
0 1 1 4 @ 0 func4:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
ss: 0x8049a94 1 push ebp
ss: 0x8049a95 2 mov ebp, esp
ss: 0x8049a97 3 sub esp, 0x8
ss: 0x8049a9a 5 call 0x420628d8
0 1 1 4 @ 0 func5:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
----------------------------------

This report shows the assembler instructions that were executed between the entry and the branch instruction. The report provides a detailed view of what is going on in the system. To relate this back to the actual C code, you would have to compile the C code with the assembler option and calculate the offsets to the assembler instructions to find them in the listing.

If you add the -c option to the post-processing report generator, the output would look similar to the following.


# run.itrace -c
cpu ring instruction_count offset symbolname_modulename pid_tid
_pidname To_instruction
0 1 2 4 @ 32 printf:/lib/i686/libc-2.2.93.so 8a1_0_
/piperf/bin/itracec_ RETURN
0 1 1 5 @ 54 main:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func1:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func2:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func3:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func4:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 4 @ 0 func5:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 1 6 @ 0 func6:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ CALL
0 1 3 1 @ 50 <plt>:/piperf/bin/itracec 8a1_0_/piperf/bin/itracec_ JUMP
0 1 1 6 @ 0 printf:/lib/i686/libc-2.2.93.so 8a1_0_
/piperf/bin/itracec_ CALL
0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ RETURN
0 1 1 8 @ 11 printf:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ CALL
0 1 1 7 @ 0 _IO_vfprintf_internal:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ CALL
0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ RETURN
0 1 1 6 @ d _IO_vfprintf_internal:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ CALL
0 1 3 1 @ 470 <plt>:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ JUMP
0 1 1 4 @ 0 __errno_location:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ CALL
0 1 2 2 @ 0 __i686.get_pc_thunk.bx:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ RETURN
0 1 2 5 @ 9 __errno_location:/lib/i686/libc-2.2.93.so
8a1_0_/piperf/bin/itracec_ RETURN
----------------------------------

Java Profiler


JPROF is a Java profiling agent that dynamically responds to JVMPI events based on options passed at Java invocation. The profiler is generally referred to as JPROF but uses an executable library called libjprof.so. JPROF provides JIT address-to-name resolution to support tprof and ITtrace data reduction. Other functions include Java Lock Monitor and Java HeapDump. The profiler has been implemented for IBM JDK 1.2.2 and later and is based on the support in the JDK for the JVMPI interface.

The following example shows a report generated by the trace profiler on a Java application.


Tprof Reports
ProcessorSpeed 688000000
TraceCycles 10949059046
TraceTime 15.914(sec)
TOTAL TICKS 125160
(Clipping Level : 0.1 % 125 Ticks)
================================
TPROF Report Summary
)) Process
)) Process_Module
)) Process_Module_Symbol
)) Process_Thread
)) Process_Thread_Module
)) Process_Thread_Module_Symbol
)) Module
)) Module_Symbol
================================
================================
)) Process
================================
LAB TKS %%% NAMES
PID 15645 12.50 SystemProcess_7
PID 15645 12.50 SystemProcess_1
PID 15645 12.50 SystemProcess_0
PID 14388 11.50 SystemProcess_2
PID 14074 11.24 SystemProcess_4
PID 12806 10.23 SystemProcess_3
PID 10935 8.74 SystemProcess_6
PID 10462 8.36 SystemProcess_5
PID 4710 3.76 java_6
PID 4302 3.44 sshd_5
PID 2823 2.26 sshd_3
PID 1566 1.25 java_4
PID 1257 1.00 java_2
PID 881 0.70 java_5
================================
)) Process_Module
================================
LAB TKS %%% NAMES
PID 15645 12.50 SystemProcess_7
MOD 15645 12.50 vmlinux
PID 15645 12.50 SystemProcess_1
MOD 15645 12.50 vmlinux
PID 15645 12.50 SystemProcess_0
MOD 15645 12.50 vmlinux
PID 14388 11.50 SystemProcess_2
MOD 14388 11.50 vmlinux
PID 14074 11.24 SystemProcess_4
MOD 14074 11.24 vmlinux
PID 12806 10.23 SystemProcess_3
MOD 12806 10.23 vmlinux
PID 10935 8.74 SystemProcess_6
MOD 10935 8.74 vmlinux
PID 10462 8.36 SystemProcess_5
MOD 10462 8.36 vmlinux
PID 4710 3.76 java_6
MOD 2109 1.69 vmlinux
MOD 1767 1.41 JITCODE
MOD 268 0.21 /opt/IBMJava2-141/jre/bin/libhpi.so
MOD 243 0.19 /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD 175 0.14 /lib/tls/libpthread-0.29.so
PID 4302 3.44 sshd_5
MOD 1966 1.57 vmlinux
MOD 1349 1.08 /lib/libcrypto.so.0.9.7a
MOD 480 0.38 /usr/sbin/sshd
MOD 462 0.37 /lib/tls/libc-2.3.2.so
PID 2823 2.26 sshd_3
MOD 1369 1.09 vmlinux
MOD 844 0.67 /lib/libcrypto.so.0.9.7a
MOD 302 0.24 /usr/sbin/sshd
MOD 280 0.22 /lib/tls/libc-2.3.2.so
PID 1566 1.25 java_4
MOD 767 0.61 vmlinux
MOD 532 0.43 JITCODE
PID 1257 1.00 java_2
MOD 601 0.48 vmlinux
MOD 434 0.35 JITCODE
PID 881 0.70 java_5
MOD 444 0.35 vmlinux
MOD 299 0.24 JITCODE
================================
)) Process_Module_Symbol
================================
LAB TKS %%% NAMES
PID 15645 12.50 SystemProcess_7
MOD 15645 12.50 vmlinux
SYM 15605 12.47 default_idle
PID 15645 12.50 SystemProcess_1
MOD 15645 12.50 vmlinux
SYM 15606 12.47 default_idle
PID 15645 12.50 SystemProcess_0
MOD 15645 12.50 vmlinux
SYM 15608 12.47 default_idle
PID 14388 11.50 SystemProcess_2
MOD 14388 11.50 vmlinux
SYM 14341 11.46 default_idle
PID 14074 11.24 SystemProcess_4
MOD 14074 11.24 vmlinux
SYM 14043 11.22 default_idle
PID 12806 10.23 SystemProcess_3
MOD 12806 10.23 vmlinux
SYM 12675 10.13 default_idle
PID 10935 8.74 SystemProcess_6
MOD 10935 8.74 vmlinux
SYM 10913 8.72 default_idle
PID 10462 8.36 SystemProcess_5
MOD 10462 8.36 vmlinux
SYM 10387 8.30 default_idle
PID 4710 3.76 java_6
MOD 2109 1.69 vmlinux
SYM 404 0.32 n_tty_receive_buf
SYM 144 0.12 opost_block
SYM 141 0.11 sigprocmask
SYM 140 0.11 .text.lock.tty_io
SYM 136 0.11 tty_write
SYM 130 0.10 n_tty_receive_room
MOD 1767 1.41 JITCODE
SYM 621 0.50 hellop.main([Ljava/lang/String;)V
SYM 143 0.11 java/io/OutputStreamWriter.write([CII)V
MOD 268 0.21 /opt/IBMJava2-141/jre/bin/libhpi.so
MOD 243 0.19 /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD 175 0.14 /lib/tls/libpthread-0.29.so
PID 4302 3.44 sshd_5
MOD 1966 1.57 vmlinux
SYM 236 0.19 read_chan
SYM 125 0.10 n_tty_chars_in_buffer
MOD 1349 1.08 /lib/libcrypto.so.0.9.7a
SYM 1345 1.07 NoSymbols
MOD 480 0.38 /usr/sbin/sshd
SYM 474 0.38 NoSymbols
MOD 462 0.37 /lib/tls/libc-2.3.2.so
PID 2823 2.26 sshd_3
MOD 1369 1.09 vmlinux
SYM 175 0.14 read_chan
MOD 844 0.67 /lib/libcrypto.so.0.9.7a
SYM 842 0.67 NoSymbols
MOD 302 0.24 /usr/sbin/sshd
SYM 298 0.24 NoSymbols
MOD 280 0.22 /lib/tls/libc-2.3.2.so
PID 1566 1.25 java_4
MOD 767 0.61 vmlinux
SYM 151 0.12 n_tty_receive_buf
MOD 532 0.43 JITCODE
SYM 161 0.13 hellop.main([Ljava/lang/String;)V
PID 1257 1.00 java_2
MOD 601 0.48 vmlinux
SYM 132 0.11 n_tty_receive_buf
MOD 434 0.35 JITCODE
SYM 126 0.10 hellop.main([Ljava/lang/String;)V
PID 881 0.70 java_
MOD 444 0.35 vmlinux
MOD 299 0.24 JITCODE
================================
)) Process_Thread
================================
LAB TKS %%% NAMES
PID 15645 12.50 SystemProcess_7
TID 15645 12.50 tid_0000
PID 15645 12.50 SystemProcess_1
TID 15645 12.50 tid_0000
PID 15645 12.50 SystemProcess_0
TID 15645 12.50 tid_0000
PID 14388 11.50 SystemProcess_2
TID 14388 11.50 tid_0000
PID 14074 11.24 SystemProcess_4
TID 14074 11.24 tid_0000
PID 12806 10.23 SystemProcess_3
TID 12806 10.23 tid_0000
PID 10935 8.74 SystemProcess_6
TID 10935 8.74 tid_0000
PID 10462 8.36 SystemProcess_5
TID 10462 8.36 tid_0000
PID 4710 3.76 java_6
TID 4668 3.73 tid_main
PID 4302 3.44 sshd_5
TID 4298 3.43 tid_02c5
PID 2823 2.26 sshd_3
TID 2823 2.26 tid_02c5
PID 1566 1.25 java_4
TID 1527 1.22 tid_main
PID 1257 1.00 java_2
TID 1222 0.98 tid_main
PID 881 0.70 java_5
TID 832 0.66 tid_main
================================
)) Process_Thread_Module
================================
LAB TKS %%% NAMES
PID 15645 12.50 SystemProcess_7
TID 15645 12.50 tid_0000
MOD 15645 12.50 vmlinux
PID 15645 12.50 SystemProcess_1
TID 15645 12.50 tid_0000
MOD 15645 12.50 vmlinux
PID 15645 12.50 SystemProcess_0
TID 15645 12.50 tid_0000
MOD 15645 12.50 vmlinux
PID 14388 11.50 SystemProcess_2
TID 14388 11.50 tid_0000
MOD 14388 11.50 vmlinux
PID 14074 11.24 SystemProcess_4
TID 14074 11.24 tid_0000
MOD 14074 11.24 vmlinux
PID 12806 10.23 SystemProcess_3
TID 12806 10.23 tid_0000
MOD 12806 10.23 vmlinux
PID 10935 8.74 SystemProcess_6
TID 10935 8.74 tid_0000
MOD 10935 8.74 vmlinux
PID 10462 8.36 SystemProcess_5
TID 10462 8.36 tid_0000
MOD 10462 8.36 vmlinux
PID 4710 3.76 java_6
TID 4668 3.73 tid_main
MOD 2075 1.66 vmlinux
MOD 1767 1.41 JITCODE
MOD 268 0.21 /opt/IBMJava2-141/jre/bin/libhpi.so
MOD 236 0.19 /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD 174 0.14 /lib/tls/libpthread-0.29.so
PID 4302 3.44 sshd_5
TID 4298 3.43 tid_02c5
MOD 1965 1.57 vmlinux
MOD 1349 1.08 /lib/libcrypto.so.0.9.7a
MOD 478 0.38 /usr/sbin/sshd
MOD 461 0.37 /lib/tls/libc-2.3.2.so
PID 2823 2.26 sshd_3
TID 2823 2.26 tid_02c5
MOD 1369 1.09 vmlinux
MOD 844 0.67 /lib/libcrypto.so.0.9.7a
MOD 302 0.24 /usr/sbin/sshd
MOD 280 0.22 /lib/tls/libc-2.3.2.so
PID 1566 1.25 java_4
TID 1527 1.22 tid_main
MOD 734 0.59 vmlinux
MOD 532 0.43 JITCODE
PID 1257 1.00 java_2
TID 1222 0.98 tid_main
MOD 580 0.46 vmlinux
MOD 434 0.35 JITCODE
PID 881 0.70 java_5
TID 832 0.66 tid_main
MOD 410 0.33 vmlinux
MOD 299 0.24 JITCODE
================================
)) Process_Thread_Module_Symbol
================================
LAB TKS %%% NAMES
PID 15645 12.50 SystemProcess_7
TID 15645 12.50 tid_0000
MOD 15645 12.50 vmlinux
SYM 15605 12.47 default_idle
PID 15645 12.50 SystemProcess_1
TID 15645 12.50 tid_0000
MOD 15645 12.50 vmlinux
SYM 15606 12.47 default_idle
PID 15645 12.50 SystemProcess_0
TID 15645 12.50 tid_0000
MOD 15645 12.50 vmlinux
SYM 15608 12.47 default_idle
PID 14388 11.50 SystemProcess_2
TID 14388 11.50 tid_0000
MOD 14388 11.50 vmlinux
SYM 14341 11.46 default_idle
PID 14074 11.24 SystemProcess_4
TID 14074 11.24 tid_0000
MOD 14074 11.24 vmlinux
SYM 14043 11.22 default_idle
PID 12806 10.23 SystemProcess_3
TID 12806 10.23 tid_0000
MOD 12806 10.23 vmlinux
SYM 12675 10.13 default_idle
PID 10935 8.74 SystemProcess_6
TID 10935 8.74 tid_0000
MOD 10935 8.74 vmlinux
SYM 10913 8.72 default_idle
PID 10462 8.36 SystemProcess_5
TID 10462 8.36 tid_0000
MOD 10462 8.36 vmlinux
SYM 10387 8.30 default_idle
PID 4710 3.76 java_6
TID 4668 3.73 tid_main
MOD 2075 1.66 vmlinux
SYM 404 0.32 n_tty_receive_buf
SYM 144 0.12 opost_block
SYM 141 0.11 sigprocmask
SYM 140 0.11 .text.lock.tty_io
SYM 136 0.11 tty_write
SYM 130 0.10 n_tty_receive_room
MOD 1767 1.41 JITCODE
SYM 621 0.50 hellop.main([Ljava/lang/String;)V
SYM 143 0.11 java/io/OutputStreamWriter.write([CII)V
MOD 268 0.21 /opt/IBMJava2-141/jre/bin/libhpi.so
MOD 236 0.19 /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD 174 0.14 /lib/tls/libpthread-0.29.so
PID 4302 3.44 sshd_5
TID 4298 3.43 tid_02c5
MOD 1965 1.57 vmlinux
SYM 236 0.19 read_chan
SYM 125 0.10 n_tty_chars_in_buffer
MOD 1349 1.08 /lib/libcrypto.so.0.9.7a
SYM 1345 1.07 NoSymbols
MOD 478 0.38 /usr/sbin/sshd
SYM 472 0.38 NoSymbols
MOD 461 0.37 /lib/tls/libc-2.3.2.so
PID 2823 2.26 sshd_3
TID 2823 2.26 tid_02c5
MOD 1369 1.09 vmlinux
SYM 175 0.14 read_chan
MOD 844 0.67 /lib/libcrypto.so.0.9.7a
SYM 842 0.67 NoSymbols
MOD 302 0.24 /usr/sbin/sshd
SYM 298 0.24 NoSymbols
MOD 280 0.22 /lib/tls/libc-2.3.2.so
PID 1566 1.25 java_4
TID 1527 1.22 tid_main
MOD 734 0.59 vmlinux
SYM 151 0.12 n_tty_receive_buf
MOD 532 0.43 JITCODE
SYM 161 0.13 hellop.main([Ljava/lang/String;)V
PID 1257 1.00 java_2
TID 1222 0.98 tid_main
MOD 580 0.46 vmlinux
SYM 132 0.11 n_tty_receive_buf
MOD 434 0.35 JITCODE
SYM 126 0.10 hellop.main([Ljava/lang/String;)V
PID 881 0.70 java_5
TID 832 0.66 tid_main
MOD 410 0.33 vmlinux
MOD 299 0.24 JITCODE
================================
)) Module
================================
LAB TKS %%% NAMES
MOD 116871 93.38 vmlinux
MOD 3032 2.42 JITCODE
MOD 2193 1.75 /lib/libcrypto.so.0.9.7a
MOD 782 0.62 /usr/sbin/sshd
MOD 743 0.59 /lib/tls/libc-2.3.2.so
MOD 443 0.35 /opt/IBMJava2-141/jre/bin/libhpi.so
MOD 441 0.35 /opt/IBMJava2-141/jre/bin/classic/libjvm.so
MOD 311 0.25 /lib/tls/libpthread-0.29.so
MOD 208 0.17 NoModule
MOD 133 0.11 /opt/IBMJava2-141/jre/bin/libjava.so
================================
)) Module_Symbol
================================
LAB TKS %%% NAMES
MOD 116871 93.38 vmlinux
SYM 109178 87.23 default_idle
SYM 783 0.63 n_tty_receive_buf
SYM 413 0.33 .text.lock.tty_io
SYM 411 0.33 read_chan
SYM 378 0.30 sigprocmask
SYM 331 0.26 __wake_up
SYM 310 0.25 sysenter_past_esp
SYM 299 0.24 schedule
SYM 271 0.22 tty_write
SYM 256 0.20 opost_block
SYM 246 0.20 add_wait_queue
SYM 229 0.18 n_tty_receive_room
SYM 229 0.18 n_tty_chars_in_buffer
SYM 197 0.16 __copy_from_user_ll
SYM 191 0.15 sys_rt_sigprocmask
SYM 177 0.14 do_select
SYM 174 0.14 __copy_to_user_ll
SYM 170 0.14 remove_wait_queue
SYM 154 0.12 write_chan
SYM 153 0.12 pty_write
SYM 139 0.11 sys_select
SYM 125 0.10 kill_fasync
MOD 3032 2.42 JITCODE
SYM 1011 0.81 hellop.main([Ljava/lang/String;)V
SYM 254 0.20 java/io/OutputStreamWriter.write([CII)V
SYM 198 0.16 sun/nio/cs/StreamEncoder.flushBuffer()V
SYM 168 0.13 java/lang/String.<init>(Ljava/lang/String;I)V
SYM 168 0.13 java/io/PrintStream.write([BII)V
SYM 132 0.11 java/io/PrintStream.write(Ljava/lang/String;)V
MOD 2193 1.75 /lib/libcrypto.so.0.9.7a
SYM 2187 1.75 NoSymbols
MOD 782 0.62 /usr/sbin/sshd
SYM 772 0.62 NoSymbols
MOD 743 0.59 /lib/tls/libc-2.3.2.so
SYM 152 0.12 memset
SYM 125 0.10 __memchr
MOD 443 0.35 /opt/IBMJava2-141/jre/bin/libhpi.so
MOD 441 0.35 /opt/IBMJava2-141/jre/bin/classic/libjvm.so
SYM 141 0.11 jitCacheAlloc
MOD 311 0.25 /lib/tls/libpthread-0.29.so
MOD 208 0.17 NoModule
SYM 208 0.17 NoSymbols
MOD 133 0.11 /opt/IBMJava2-141/jre/bin/libjava.so
Offset Report
================================
)) Module_Symbol_Offset
================================
LAB TKS %%% NAMES
MOD 116871 93.38 vmlinux
SYM 109178 87.23 default_idle
OFF 109174 87.23 0x2d
SYM 783 0.63 n_tty_receive_buf
OFF 440 0.35 0x157
SYM 413 0.33 .text.lock.tty_io
OFF 204 0.16 0xb1
SYM 411 0.33 read_chan
SYM 378 0.30 sigprocmask
OFF 321 0.26 0x77
SYM 331 0.26 __wake_up
OFF 326 0.26 0x37
SYM 310 0.25 sysenter_past_esp
OFF 127 0.10 0x3
SYM 299 0.24 schedule
OFF 277 0.22 0x22d
SYM 271 0.22 tty_write
SYM 256 0.20 opost_block
SYM 246 0.20 add_wait_queue
OFF 212 0.17 0x37
SYM 229 0.18 n_tty_receive_room
SYM 229 0.18 n_tty_chars_in_buffer
OFF 212 0.17 0x33
SYM 197 0.16 __copy_from_user_ll
SYM 191 0.15 sys_rt_sigprocmask
SYM 177 0.14 do_select
SYM 174 0.14 __copy_to_user_ll
SYM 170 0.14 remove_wait_queue
OFF 147 0.12 0x1d
SYM 154 0.12 write_chan
SYM 153 0.12 pty_write
SYM 139 0.11 sys_select
SYM 125 0.10 kill_fasync
MOD 3032 2.42 JITCODE
SYM 1011 0.81 hellop.main([Ljava/lang/String;)V
OFF 405 0.32 0x1e0
OFF 257 0.21 0x1e3
OFF 192 0.15 0x1e4
SYM 254 0.20 java/io/OutputStreamWriter.write([CII)V
SYM 198 0.16 sun/nio/cs/StreamEncoder.flushBuffer()V
SYM 168 0.13 java/lang/String.<init>(Ljava/lang/String;I)V
SYM 168 0.13 java/io/PrintStream.write([BII)V
SYM 132 0.11 java/io/PrintStream.write(Ljava/lang/String;)V
MOD 2193 1.75 /lib/libcrypto.so.0.9.7a
SYM 2187 1.75 NoSymbols
OFF 141 0.11 0x2bcf3
MOD 782 0.62 /usr/sbin/sshd
SYM 772 0.62 NoSymbols
MOD 743 0.59 /lib/tls/libc-2.3.2.so
SYM 152 0.12 memset
SYM 125 0.10 __memchr
MOD 443 0.35 /opt/IBMJava2-141/jre/bin/libhpi.so
MOD 441 0.35 /opt/IBMJava2-141/jre/bin/classic/libjvm.so
SYM 141 0.11 jitCacheAlloc
MOD 311 0.25 /lib/tls/libpthread-0.29.so
MOD 208 0.17 NoModule
SYM 208 0.17 NoSymbols
OFF 134 0.11 0xffffe410
MOD 133 0.11 /opt/IBMJava2-141/jre/bin/libjava.so
</pre>

Java Lock Monitor


Java Lock Monitor (JLM) support is provided with version 1.4.0 of the IBM JDK. JLM provides monitor hold time accounting and contention statistics on monitors used in Java applications and the JVM itself. JLM provides support for the following counters.

Counters associated with contended locks:

Total number of successful acquires.

Recursive acquires. Number of times the monitor was requested and was already owned by the requesting thread.

Number of times the requesting thread was blocked waiting on the monitor because the monitor was already owned by another thread.

Cumulative time the monitor was held.

The following statistics are also collected on platforms that support 3-Tier Spin Locking (x86 SMP):

Number of times the requesting thread went through the inner (spin) loop while attempting to acquire the monitor.

Number of times the requesting thread went through the outer (thread yield) loop while attempting acquire the monitor.

Garbage collection (GC) time is removed from hold times for all monitors held across a GC cycle.

A monitor can be acquired either recursively, when the requesting thread already owns it, or nonrecursively, when the requesting thread does not already own it. Non-recursive acquires can be further divided into fast and slow. Fast is when the requested monitor is not already owned and the requesting thread gains ownership immediately. On platforms that implement 3-Tier Spin Locking, any monitor acquired while spinning is considered a fast acquire, regardless of the number of iterations in each tier. Slow is when the requested monitor is already owned by another thread and the requesting thread is blocked.[View full width]

Java Lock Monitor Report
Version_4.26 (05.01.2002)
Built : ( Wed May 1 14:44:28 CDT 2002 )
JLM_Interval_Time 34021158156
System (Registered) Monitors
%MISS GETS NONREC SLOW REC TIER2 TIER3 %UTIL AVER-HTM MON-NAME
87 5273 5273 4572 0 710708 18487 1 95408 ITC Global_Compile lock
9 6870 6869 631 1 113420 2976 0 11807 Heap lock
5 1123 1123 51 0 11098 286 1 248385 Binclass lock
0 1153 1147 5 6 1307 33 0 47974 Monitor Cache lock
0 46149 45877 134 272 36961 877 1 6558 ITC CHA lock
0 33734 23483 19 10251 6544 150 1 17083 Thread queue lock
0 5 5 0 0 0 0 0 9309689 JNI Global Reference lock
0 5 5 0 0 0 0 0 9283000 JNI Pinning lock
0 5 5 0 0 0 0 0 9442968 Sleep lock
0 1 1 0 0 0 0 0 0 Monitor Registry lock
0 0 0 0 0 0 0 0 0 Evacuation Region lock
0 0 0 0 0 0 0 0 0 Method trace lock
0 0 0 0 0 0 0 0 0 Classloader lock
0 0 0 0 0 0 0 0 0 Heap Promotion lock
Java (Inflated) Monitors
%MISS GETS NONREC SLOW REC TIER2 TIER3 %UTIL AVER-HTM MON-NAME
15 68 68 10 0 2204 56 2 11936405 test.lock.testlock1@A09410
/A09418
2 42 42 1 0 186 5 0 300478 test.lock.testlock2@D31358
/D31360
0 70 70 0 0 41 1 0 7617 Java.lang.ref
.ReferenceQueue$Lock@920628/920630
LEGEND:
%MISS : 100 * SLOW / NONREC
GETS : Lock Entries
NONREC : Non Recursive Gets
SLOW : Non Recursives that Wait
REC : Recursive Gets
TIER2 : SMP Wait Hierarchy
TIER3 : SMP Wait Hierarchy
%UTIL : 100 * Hold-Time / Total-Time
AVER-HT : Hold-Time / NONREC

Descriptions of the report's fields are as follows:

JLM_Interval_Time.
Time interval between the start and end of measurement. Time is expressed in the units appropriate for the hardware platform: cycles for x86, IA64, and S390, and time-based ticks for PPC.

%MISS.
Percentage of the total GETS (acquires) where the requesting thread was blocked waiting on the monitor.


%MISS = (SLOW / NONREC) * 100

GETS.
Total number of successful acquires.


GETS = FAST + SLOW + REC

NONREC.
Total number of nonrecursive acquires. This number includes SLOW gets.

SLOW.
Total number of nonrecursive acquires which caused the requesting thread to block waiting for the monitor to no longer be owned. This number is included in NONREC.

To calculate the number of nonrecursive acquires in which the requesting thread obtained ownership immediately (FAST), subtract SLOW from NONREC. On platforms that support 3-Tier Spin Locking, monitors acquired while spinning are considered FAST acquires.

REC.
Total number of recursive acquires. A recursive acquire is one where the requesting thread already owns the monitor.

TIER2.
Total number of Tier 2 (inner spin loop) iterations on platforms that support 3-Tier Spin Locking.

TIER3.
Total number of Tier 3 (outer thread yield loop) iterations on platforms that support 3-Tier Spin Locking.

%UTIL.
Monitor hold time divided by JLM_Interval_Time. Hold time accounting must be turned on.

AVER-HTM.
Average amount of time the monitor was held. Recursive acquires are not included because the monitor is already owned when acquired recursively.


Total hold time / NONREC

MON-NAME.
Monitor name or NULL (blank) if the name is not known.

Performance Inspector Executable Tools


The following tools, which are shipped with the Performance Inspector, provide support for the PI functions.

swtrace


swtrace is a software tracing mechanism that runs on Linux. swtrace is normally run from a command prompt by issuing the swtrace command with the appropriate arguments.

swtrace uses software trace hooks to collect data. Trace hooks are identified by both a major code and a minor code. Trace data is collected to a trace buffer that is allocated when swtrace is initialized or turned on. The size of the trace buffer can be set when swtrace is initialized. The swtrace command allows the user to select which major codes are traced, when tracing starts, when tracing stops, when data is transferred from the trace buffer to disk, and formatting of the trace data.

The major parameters supported by swtrace are as follows:

init.
Tells the trace profiler to allocate the trace buffers and initialize the system for tracing. With init, the size of the trace buffer to be allocated and the performance counter to use for taking the trace can be optionally specified.

enable.
Enables the trace hooks within the Linux kernel. This controls what information is placed in the trace buffer.

disable.
Keeps the specified trace hooks from being measured.

on.
Starts the trace. The trace information is gathered until the swtrace off command is given.

get.
Dumps the contents of the trace buffers to a file for processing by the report generator program.

it_install.
Initializes the instruction trace facility.

it_remove.
Resets the instruction trace functionality.

Other parameters display information about the Performance Inspector and control the rate of profiling. The command file run.tprof is generated when PI is installed in the system. This command file contains all the steps necessary to take a profiling trace and produce a report.

post


post produces various reports based on the trace profiling data. When a trace profiling report is produced, it is written to a file called tprof.out. When you install the Performance Inspector, you identify the directory where this file will be saved. If you want to keep the current tprof.out file, you must rename it something else before running another run.tprof command.

One option supported by the post command is -show. -show creates a file called post.show, which is a dump in a readable format of all the trace records. When all the trace hooks are enabled, post.show gives a detailed look at the sequence of events that occurred in the system, from execs to dispatches to interrupts.

pipcntr


pipcntr controls and displays the performance counters when PI is running on Intel IA32 platforms. This program can start and stop counters and display the contents of the counter registers and the counter control register settings. You can use this utility to start a performance counter and then run the per-thread time utility using this counter. The same holds true for the trace profiling function.

ptt


ptt starts and stops the per-thread time measurement. When starting, you can also specify what metric to use to perform the measurement.

pttstats


pttstats displays the per-thread time of every process in the system. On Intel IA32 systems, the measurement metric can be either the real-time clock or a performance monitor counter.


/ 227