Mastering BEA WebLogic Server Best Practices for Building and Deploying J2EE Applications [Electronic resources]

Gregory Nyberg, Robert Patricket al

نسخه متنی -صفحه : 125/ 80

Troubleshooting Performance Problems

Your application and environment are now tuned to perfection, users are happy, and the system is taking hundreds of hits per second without batting an eye, right? If not, then read on as we present a tried and true methodology for troubleshooting performance problems.

Successful troubleshooting requires a strong understanding of the system and its components, a good problem-resolution process, and knowledge of common performance problems and their solutions. Every system is different, and every performance problem is likely to be different, but there are a number of best practices worth outlining to help you through your own troubleshooting efforts.

Preparing for Troubleshooting

Troubleshooting performance problems can be a difficult and time-consuming process unless you prepare ahead of time. When the users are unhappy and the pressure is on, you must have the proper infrastructure, processes, and people in place to address the problem.

First, the application should have been thoroughly tested and profiled during performance testing. You need to know how the application performed in the test environment to know if the performance problem you are tackling is real or simply a normal slowdown under peak loads. Your test results also indicate the normal resource usage of the individual transaction under investigation for comparison with observed resource usage in production. Good testing is critical to efficient production troubleshooting.

Next, you must have all necessary performance monitoring mechanisms in place to provide information concerning system performance and activity. Recognize that many performance problems do not happen on demand, so you will need some form of logging to reconstruct system resource usage and activity during a period in question. Simple shell scripts that log selected output from system monitoring tools are often sufficient for this purpose.

Finally, you need a team and a process in place before the problem occurs. It is a good idea to form a multi-disciplinary swat team and make that team responsible for troubleshooting performance problems. Typically, we recommend using many of the same people who did the original performance testing because they already understand the behavior of the system under various loads. Create a well-documented process for responding to performance problems, including a database or other knowledge repository for storing information on previous incidents and remedies.

Once you’ve done everything you can to prepare for performance problems, all you can do is wait and see how the system performs. Should a problem arise, the team’s first order of business is to identify the root cause of the performance problem, also known as the bottleneck.

Bottleneck Identification and Correction

A bottleneck is a resource within a system that limits overall throughput or adds substantially to response time. Finding and fixing bottlenecks in distributed systems can be very difficult and requires experienced multi-disciplinary teams. Bottlenecks can occur in the Web server, application code, application server, database, network, hardware, network devices, or operating system. Experience has shown that bottlenecks are more likely to occur in some areas than in others, the most common areas being these:

Database connections and queries

Application server code

Application server and Web server hardware

Network and TCP configuration

Remember that there is rarely a single bottleneck in a system. Fixing one bottleneck will improve performance but often highlights a different bottleneck. Bottlenecks should be identified one at a time, corrected, and the system tested again to ensure that another bottleneck does not appear before reaching the required performance levels.

In order to identify bottlenecks quickly and correctly you must understand your system. The team responsible for problem resolution must know all of the physical components of the system. For each physical component (server, network device, etc.), the team needs detailed knowledge of all the logical components (software) deployed there. Ideally, all of this information will be documented and available to the swat team members who are responsible for troubleshooting. The team can prepare for problems by identifying all the potential bottlenecks for each component and determining the proper way to monitor and troubleshoot these areas.

The following lists document some of the typical components and areas of concern related to each of them. The team must be aware of these potential bottlenecks and be prepared to monitor the related resource usage to identify the specific bottleneck responsible for a given performance problem quickly.

Common areas of concern for firewall devices include the following:

Total connections.

SSL connections—If you exceed more than 20 SSL handshakes per second per Web server you may need an SSL accelerator.

CPU utilization—Make sure CPU utilization does not average above 80 percent.

I/O—If the firewall is logging make sure it is not I/O bound.

Throughput.

Common areas of concern for load balancers include these:

Total connections.

Connection balance.

CPU utilization—Make sure average CPU utilization does not exceed 80 percent.

Throughput.

Common areas of concern for Web servers include the following:

CPU utilization—Make sure average CPU utilization does not exceed 80 percent.

Memory—Make sure excessive paging is not taking place.

Throughput—Monitor network throughput to make sure you do not have an overutilized network interface card.

Connections—Make sure connections are balanced among the servers.

SSL connections—Make sure that the number of SSL handshakes per second is not too much for the hardware and Web server software. Consider using SSL accelerators if it is too high.

Disk I/O—Make sure the Web servers are not I/O bound, especially if they are serving a lot of static content.

Common areas of concern for application servers include the following:

Memory—Make sure there is enough memory to prevent the JVM from paging.

CPU—Make sure average CPU utilization does not exceed 80 percent.

Database connection pools—Make sure application threads are not waiting for database connection excessively. Also, check to make sure the application is not leaking connections.

Execute queue—Watch the queue depth to make sure it does not consistently exceed a predetermined depth.

Execute queue wait—Make sure messages are not starved in the queue.

Common areas of concern for database servers include these:

Memory—Make sure excessive paging and high I/O wait time are not occurring.

CPU—Make sure average CPU utilization does not exceed 80 percent.

Cache hit ratio—Make sure the cache is set high enough to prevent excessive disk I/O.

Parse time—Make sure excessive parsing is not taking place.

For each area of concern, you may want to put system-monitoring tools in place that will take measurements of these variables and trigger an alert if they exceed normal levels. If system-monitoring tools are not available for a component, you will need to have scripts or other mechanisms in place that you can use to gather the required information.

Best Practice

Identifying bottlenecks quickly in production systems requires a thorough knowledge of the hardware and software components of your system and the types of potential bottlenecks common in each of these areas. Ensure that system-monitoring tools capture appropriate information in all areas of concern to support troubleshooting efforts. Consider creating scripts or processes that monitor system resources and notify team members proactively if values exceed thresholds.

Problem Resolution

Troubleshooting performance problems should be accomplished using a documented, predefined problem resolution process similar to the high-level flowchart depicted in Figure 12.3. We will touch briefly on each step in the flow chart to give you a better feel for the process.

Figure 12.3: Problem resolution flow chart.

The first step in the process is to define the problem. There are two primary sources of problems requiring resolution: user reports and system-monitoring alerts. Translating information from these sources into a clear definition of the problem is not as easy as you might think. Reports such as “the system seemed slow yesterday” don’t really help you define or isolate the problem. Provide users with a well-designed paper form or online application for reporting problems to ensure that all important information about the problem is captured while it is still fresh in their minds. Understanding how the user was interacting with the system may lead you directly to the root of the problem. If not, move on to the next step in the process.

The next step involves checking all potential bottlenecks, paying special attention to areas that have been problems in the past. Consult your system monitoring tools and logs to check for any suspicious resource usage or changes in normal processing levels.

If you are unable to identify the location of the bottleneck or root cause of the performance problem you will need to perform a more rigorous analysis of all components in the system, looking for more subtle evidence of the problem. Start by identifying the layer in the application most likely to be responsible for the problem and then drilling in to components in that layer looking for the culprit. If you discover a new bottleneck or area of concern, make sure to document the new bottleneck, adding it to the list of usual suspects for the next time.

Once you’ve identified the location of the bottleneck you can apply appropriate tuning options and best practices to solve the problem. Document the specific changes made to solve the problem for future use. If nothing seems to work, you may need to step back, revisit everything you’ve observed and concluded, and try the process again from the top. Consider the possibility that two or more bottlenecks are combining to cause the problem or that your analysis has led you to an incorrect conclusion about the location of the bottleneck. Persevere, and you will find it eventually.

Common Application Server Performance Problems

This section documents a variety of common problems and how you can identify and solve them in your environment.

Troubleshooting High CPU Utilization and Poor Application Server Throughput

The first step in resolving this problem is to identify the root cause of the high CPU utilization. Consider the following observations and recommendations:

Most likely the problem will reside in the application itself, so a good starting point is to profile the application code to determine which areas of the application are using excessive processor resources. These heavyweight operations or subsystems are then optimized or removed to reduce CPU utilization.

Profile the garbage collection activity of the application. This can be accomplished using application-profiling tools or starting your application with the -verbose:gc option set. If the application is spending more than 25 percent of its time performing garbage collection, there may be an issue with the number of temporary objects that the application is creating. Reducing the number of temporary objects should reduce garbage collection and CPU utilization substantially.

Refer to information in this chapter and other tuning resources available from BEA to make sure the application server is tuned properly.

Add hardware to meet requirements.

Troubleshooting Low CPU Utilization and Poor Application Server Throughput

This problem can result from bottlenecks or inefficiencies upstream, downstream, or within the application server. Correct the problem by walking through a process similar to the following:

Verify that the application server itself is functioning normally using the weblogic.Admin command-line administration tool to request a GETSTATE and a series of PING operations. Chapter 11 walked through the use of this tool and the various command-line options and parameters available. Because the GETSTATE and PING operations flow through the normal execute queue in the application server, good response times are an indication that all is well within the server. Poor response times indicate potential problems requiring additional analysis.

If the GETSTATE operation reports a healthy state but the PING operations are slow, check to see if the execute queue is backed up by viewing the queue depth in the WebLogic Console.

A backed-up execute queue may indicate that the system is starved for execute threads. If all execute threads are active and CPU utilization is low, adding execute threads should improve throughput.

If the queue appears starved but adding execute threads does not improve performance, there may be resource contention. Because CPU utilization is low, the threads are probably spending much of their time waiting for some resource, quite often a database connection. Use the JDBC monitoring facilities in the console to check for high levels of waiters or long wait times. Adding connections to the JDBC connection pool may be all that is required to fix the problem.

If database connections are not the problem you should take periodic thread dumps of the JVM to determine if the threads are routinely waiting for a particular resource. Take a series of 4 thread dumps about 5 to 10 seconds apart, and compare them with one another to determine if individual threads are stuck or waiting on the same resource long enough to appear in multiple thread dumps. The problem threads may be waiting on a resource held by another thread or may be waiting to update the same table in the database. Once the resource contention is identified you can apply the proper remedies to fix the problem.

If the application server is not the bottleneck, the cause is most likely upstream of the server, perhaps in the network or Web server. Use the system monitoring tools you have in place to check all of the potential bottlenecks upstream of the application server and troubleshoot these components.

Troubleshooting Low Activity and CPU Utilization on All Physical Components with Slow Throughput

If CPU utilization stays low even when user load on the system is increasing, you should look at the following:

Is there any asynchronous messaging in the system? If the system employs asynchronous messaging, check the message queues to make sure they are not backing up. If the queues are backing up and there are no message-ordering requirements, try adding more dispatcher threads to increase throughput of the queue.

Check to see if the Web servers or application servers are thread starved. If they are, increase the number of server processes or server threads to increase parallelism.

Troubleshooting Slow Response Time from the Client and Low Database Usage

These symptoms are usually caused by a bottleneck upstream of the database, perhaps in the JDBC connection pooling. Monitor the active JDBC connections in the WebLogic Console and watch for excessive waiters and wait times; increase the pool size, if necessary. If the pool is not the problem, there must be some other resource used by the application that is introducing latency or causing threads to wait. Often, periodic thread dumps can reveal what the resource might be.

Troubleshooting Erratic Response Times and CPU Utilization on the Application Server

Throughput and CPU will always vary to some extent during normal operation, but large, visible swings indicate a problem. First look at the CPU utilization, and determine if there are any patterns in the CPU variations. Two patterns are common:

CPU utilization peaks or patterns coincide with garbage collection. If your application is running on a multiple CPU machine with only one application server, you are most likely experiencing the effects of non-parallelized garbage collection in the application server. Depending on your JVM settings, garbage collection may be causing all other threads inside the JVM to block, preventing all other processing. In addition, many garbage collectors use a single thread to do their work so that all of the work is done by a single CPU, leaving the other processors idle until the collection is complete. Try using one of the parallel collectors or deploying multiple application servers on each machine to alleviate this problem and use server resources more efficiently. The threads in an application server not performing the garbage collection will be scheduled on processors left idle by the server performing collection, yielding a more constant throughput and more efficient CPU utilization. Also consider tuning the JVM options to optimize heap usage and improve garbage collection using techniques described earlier in this chapter.

CPU peaks on one component coincide with valleys on an adjacent component. You should also observe a similar oscillating pattern in the application server throughput. This behavior results from a bottleneck that is either upstream or downstream from the application server. By analyzing the potential bottlenecks being monitored on the various upstream and downstream components you should be able to pinpoint the problem. Experience has shown that firewalls, database servers, and Web servers are most likely to cause this kind of oscillation in CPU and throughput. Also, make sure the file descriptor table is large enough on all Unix servers in the environment.

Troubleshooting Performance Degrading with High Disk I/O

If a high disk I/O rate is observed on the application server machine, the most likely culprit will be excessive logging. Make sure that WebLogic Server is set to the proper logging level, and check to see that the application is not making excessive System.out.println() or other logging method calls. System.out.println() statements make use of synchronized processing for the duration of the disk I/O and should not be used for logging purposes. Unexpected disk I/O on the server may also be a sign that your application is logging error messages. The application server logs should be viewed to determine if there is a problem with the application.

Java Stack Traces

This section discusses the reading and interpretation of Java stack traces in WebLogic Server. A Java stack trace displays a snapshot of the current state of all threads in a JVM (Java Virtual Machine) process. This trace represents a quick and precise way to determine bottlenecks, hung threads, and resource contention in your application.

Understanding Thread States

The snapshot produced by a Java stack trace will display threads in various states. Not all Java stack traces will use the same naming convention, but typically each thread will be in one of the following states: runnable, waiting on a condition variable, and waiting on a monitor lock.

Threads in the runnable state represent threads that are either currently running on a processor or are ready to run when a processor is available. At any given time, there can be only one thread actually executing on each processor in the machine; the rest of the runnable threads will be ready to run but waiting on a processor. You can identify threads in a runnable state by the runnable keyword in the stack trace, as shown here:

"ExecuteThread: ‘1’ for queue: ‘weblogic.socket.Muxer’" daemon prio=5
tid=0x1068C2F0 nid=0xae runnable [10e8f000..10e8fd8c]
at weblogic.socket.NTSocketMuxer.getIoCompletionResult(Native Method)
...

Threads waiting on a condition variable are sleeping, waiting to be notified by their manager that work is ready for processing. The stack trace indicates this with the in Object.wait() message:

"ExecuteThread: ‘4’ for queue: ‘weblogic.kernel.System’" daemon prio=5
tid=0x007E3A00 nid=0x151 in Object.wait() [fb0f000..fb0fd8c]
at java.lang.Object.wait(Native Method)
- waiting on <0496C5C0> (a weblogic.kernel.ExecuteThread)
at java.lang.Object.wait(Object.java:426)
at weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:149)
- locked <0496C5C0> (a weblogic.kernel.ExecuteThread)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:175)

Applications use monitor locks, or mutexes, to synchronize access to critical sections of code that require single-threaded access. When you see a thread that has waiting for monitor entry in its stack trace, the thread is waiting to access synchronized code, such as the thread shown here:

"ExecuteThread: ‘24’ for queue: ‘weblogic.kernel.Default’" daemon 
prio=5 tid=0x1771A968 nid=0x76c waiting for monitor entry
[1896f000..1896fdc0]
at mastering.weblogic.test.MutexServlet.doGet(MutexServlet.java:18)
- waiting to lock <02C2E508> (a mastering.weblogic.test.MutexServlet)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
...

Two different types of thread dumps can be produced in a typical environment: system-level process dumps, otherwise known as core dumps, and Java thread dumps.

Generating System-Level Process Dumps

System-level process dumps are generated by the JVM itself in response to a system error condition; typically, this happens because some native code is trying to access an illegal memory address. The content of this dump depends on whether the JVM can call the signal handler before the process itself core dumps. If the JVM can call the signal handler, then it will typically produce a text file in the current directory containing information about the process and the thread in which the low-level error occurred. If the JVM is unable to call the signal handler, a core dump file will be produced containing information about the JVM’s native operating system process rather than the Java classes themselves. This type of dump is much less valuable and should be used only if no Java stack trace is available.

Generating Java Thread Dumps

Sending a special signal to the JVM generates a Java stack trace. On Unix platforms you send the SIGQUIT signal using the kill command. On most Unix platforms, the command kill –QUIT <PID> , where <PID> is the process identifier for the JVM process, will produce a Java thread dump that shows the call stack of every user-level thread in the JVM. On a Windows platform, you generate a thread dump by pressing the Ctrl-Break key sequence in the console window in which the Java program is executing. In addition, you can generate a stack trace either by invoking the static method Thread.dumpStack() or by invoking the printStackTrace() method on an instance of the Throwable class.

When analyzing or troubleshooting an application it is important to generate multiple thread dumps over a period of time to identify hung threads properly and better understand the application state. Start by generating 3 to 5 separate thread dumps approximately 15 to 30 seconds apart. If your servers communicate with each other using RMI it may be necessary to perform this operation on all servers in the cluster simultaneously to get a full picture. Depending on the number of servers in the cluster and the number of threads in the execute queue, this process may generate a large amount of output, but the output is invaluable in diagnosing thread-related problems. Later in this section we’ll discuss how to read and interpret these thread dumps.

Reading Core Dumps

Sometimes it will be necessary to examine the core file to determine what has caused the JVM to core dump. When you are examining this core file, remember that Java itself uses a safe memory model and that any segmentation fault must have occurred in either the native code of the application or the native code of the JVM itself. On Unix systems a core file will be produced when a JVM fails. On Windows systems, a drwtsn32.log file will be produced when a segmentation fault occurs in a JVM.

There are several ways to examine these core files, usually through debuggers like gdb or dbx. On Solaris you can also use the pstack command, as shown here:

/usr/proc/bin/pstack ./core

When using dbx to examine the JVM core file, first move to the directory where the core file resides, then execute the dbx command with the binary executable as a parameter. Remember that the java command is usually a shell script and that you must specify the actual java binary in the command. Once you have started the debugger you can use the dbx where command to show the function call stack at the time of the failure, indicating the location of the segmentation fault:

dbx /usr/java/native/java ./core
(dbx) where
Segmentation fault in glink.JNU_ReleaseStringPlatformChars at 0xd074e66c
0xd074e66c (JNU_ReleaseStringPlatformChars+0x5b564) 80830004 
lwz  r4,0x4(r3)

From this information you can often determine the location of the error and take the appropriate action. For example, if the segmentation fault is the result of a just-in-time (JIT) compiler problem and you are using the HotSpot compiler you can modify the behavior of the JIT to eliminate the problem. Create a file called .hotspot_compiler in the directory used to start the application, and indicate in this file the methods to exclude from JIT processing using entries similar to the following:

exclude java/lang/String indexOf

The specified methods will be excluded from the JIT compilation process, eliminating the core dump.

Reading Java Stack Traces

Java stack traces can be very useful during the problem-resolution process to identify the root cause for an application that seems to be hanging, deadlocked, frozen, extremely busy, or corrupting data. If your data is being corrupted, you are probably experiencing a race condition. Race conditions occur when more than one thread reads and writes to the same memory without proper synchronization. Race conditions are very hard to find by looking at stack traces because you will have to get your snapshot at the proper instant to see multiple threads accessing the same non-synchronized code.

Thread starvation or thread exhaustion can occur when threads are not making progress because they are waiting for an external resource that is either responding slowly or not at all. One particular case of this happens when WebLogic Server A makes an RMI call to WebLogic Server B and blocks waiting for a response. WebLogic Server B then calls via RMI back into WebLogic Server A before the original call returns from WebLogic Server B. If enough threads on both servers are awaiting responses from the other server, it is possible for all threads in both servers’ execute queues to be exhausted. This exhaustion behavior will show itself initially as no idle threads available in the WebLogic Server execute queue when viewed in the administration console. You can confirm this problem by generating a stack trace and looking for threads blocked waiting for data in the weblogic.rjvm.ResponseImpl.waitForData() method. Look for entries like this:

"ExecuteThread: ‘2’ for queue: ‘weblogic.kernel.Default’" daemon prio=5
tid=0x91e720 nid=0x26 in Object.wait() [d1e7e000..d1e7fc68]
at java.lang.Object.wait(Native Method)
- waiting on <06A9FBC0> (a weblogic.kernel.ExecuteThread)
at java.lang.Object.wait(Object.java:426)
at weblogic.rjvm.ResponseImpl.waitForData(ResponseImpl.java:76)
...

If a large number of threads are in this state you need to make appropriate design changes to eliminate RMI traffic between the servers or better throttle the number of threads allowed to call out and block in this way.

Deadlock occurs when individual threads are blocked waiting for the action of other threads. A deadly embrace deadlock occurs when one thread locks resource A and then tries to lock resource B, while a different thread locks resource B and then tries to lock resource A. This concept was discussed briefly in Chapter 6 in the context of exclusive locking for entity beans. Stack traces will show blocked threads within synchronized application code or within code that accesses objects using exclusive locking in one form or another. Remember that it is also possible for the application to be deadlocked across multiple JVMs with one server’s threads in a deadly embrace with another server’s threads.

A system that is inactive and has poor application performance may, in fact, be performing normally. The problem may instead be indicative of an upstream bottleneck, as described earlier in this chapter. A Java stack trace for a system in this state will display a high percentage of threads in the default or user-defined execute queue blocking until they receive some work, having a stack trace similar to the following one:

“ExecuteThread: ‘2’ for queue: ‘weblogic.kernel.Default’” daemon prio=5
tid=0x00A8DB00 nid=0x28c in Object.wait()[d32f000..d32fdc0]
at java.lang.Object.wait(Native Method)
- waiting on <0392FBC0> (a weblogic.kernel.ExecuteThread)
at java.lang.Object.wait(Object.java:426)
at weblogic.kernel.ExecuteThread.waitForRequest
(ExecuteThread.java:126)
- locked <0392FBC0> (a weblogic.kernel.ExecuteThread)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:145)

The stack trace indicates that this thread is idle, or waiting for a request, rather than busy in application code or waiting on an external resource.

Understanding WebLogic Server Stack Traces

Stack traces of a WebLogic Server instance will also show a number of common elements based on the internal design of the WebLogic Server product. As you know from previous chapters, all client requests enter WebLogic Server through a special thread called the listen thread. There will usually be two listen threads visible in a stack trace, one for SSL and the other for nonsecure transport. Here is an example of the WebLogic Server listen thread waiting for a connection to arrive:

"ListenThread.Default" prio=5 tid=0x1068CEA0 nid=0xf5 runnable
[1098f000..1098fd8c]
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:353)
- locked <05613F00> (a java.net.PlainSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:439)
at java.net.ServerSocket.accept(ServerSocket.java:410)
...

Socket connections received by WebLogic Server are registered and maintained by the socket muxer. The socket muxer reads data from the socket and dispatches the request to the appropriate subsystem. Starting in WebLogic Server 8.1, the socket muxer has its own execute thread pool that it uses to read the requests off the socket by calling the processSockets() method, as shown here for the Windows version of the native socket muxer.

“ExecuteThread: ‘1’ for queue: ‘weblogic.socket.Muxer’” daemon prio=5
tid=0x0B3BADE0 nid=0xbd0 runnable [e19f000..e19fdc0]
at weblogic.socket.NTSocketMuxer.getIoCompletionResult (Native Method)
at weblogic.socket.NTSocketMuxer.processSockets (NTSocketMuxer.java:82)
at weblogic.socket.SocketReaderRequest.execute
(SocketReaderRequest.java:32)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)

As you become more familiar with your application and better understand the internal implementation of WebLogic Server itself, your ability to interpret stack traces and troubleshoot problems will increase.