Monday, January 10, 2005

Websphere Web module or application server dies or hangs

From IBM Infocenter:

Web module or application server dies or hangs

If an application server dies (its process spontaneously closes), or freezes (its Web modules stop responding to new requests):

Isolate the problem by installing Web modules on different servers, if possible. Read the Monitoring performance with Tivoli performance viewer (formerly resource analyzer) topic. You can use the performance viewer to determine which resources have reached their maximum capacity, such as Java heap memory (indicating a possible memory leak) and database connections. If a particular resource appears to have reached its maximum capacity, review the application code for a possible cause: If database connections are used and never freed, ensure that application code performs a close() on any opened Connection object within a finally{} block. If there is a steady increase in servlet engine threads in use, review application synchronized code blocks for possible deadlock conditions. If there is a steady increase in a JVM heap size, review application code for memory leak opportunities, such as static (class-level) collections, that can cause objects to never get garbage-collected. As an alternative to using the performance viewer to detect memory leak problems, enable verbose garbage collection on the application server. This feature adds detailed statements to the JVM error log file of the application server about the amount of available and in-use memory. To set up verbose garbage collection: Select Servers > Application Servers > server_name > Process Definition > Java Virtual Machine, and enable Verbose Garbage Collection. Stop and restart the application server.

Periodically, or after the application server stops, browse the log file for garbage collection statements. Look for statements beginning with "allocation failure". The string indicates that a need for memory allocation has triggered a JVM garbage collection (freeing of unused memory). Allocation failures themselves are normal and not necessarily indicative of a problem. The allocation failure statement is followed by statements showing how many bytes are needed and how many are allocated.

If there is a steady increase in the total amount of free and used memory (the JVM keeps allocating more memory for itself), or if the JVM becomes unable to allocate as much memory as it needs (indicated by the bytes needed statement), there might be a memory leak.
If neither the performance viewer or verbose garbage collection output indicates that the application server is running out of memory, one of the following problems might be present: There is a memory leak in application code that you must address. To pinpoint the cause of a memory leak, enable the RunHProf function in the Servers > Application Servers > server_name > Process Definition > Java Virtual Machine pane of the problem application server: In the same JVM pane, set the HProf Arguments field to a value similar to depth=20,file=heapdmp.txt. This value shows exception stacks to a maximum of 20 levels, and saves the heapdump output to the install_root/bin/heapdmp.txt file. Save the settings.
Stop and restart the application server.

Re-enact the scenario or access the resource that causes the hang or crash, if possible. Stop the application server. If this is not possible, wait until the hang or crash happens again and stop the application server. Examine the file into which the heapdump was saved. For example, examine the install_root/bin/heapdmp.txt file: Search for the string, "SITES BEGIN". This finds the location of a list of Java objects in memory, which shows the amount of memory allocated to the objects. The list of Java objects occurs each time there was a memory allocation in the JVM. There is a record of what type of object the memory instantiated and an identifier of a trace stack, listed elsewhere in the dump, that shows the Java method that made the allocation. The list of Java object is in descending order by number of bytes allocated. Depending on the nature of the leak, the problem class should show up near the top of the list, but this is not always the case. Look throughout the list for large amounts of memory or frequent instances of the same class being instantiated. In the latter case, use the ID in the trace stack column to identify allocations occurring repeatedly in the same class and method. Examine the source code indicated in the related trace stacks for the possibility of memory leaks. The default maximum heap size of the application server needs to be increased.

There is a defect in the WebSphere Application Server product that you must either report, or correct by installing a fix or FixPak, from a maintenance download. Contact IBM support. If an application server spontaneously dies, look for a Java thread dump file. The JVM creates the file in the product directory structure, with a name like javacore[number].txt. Force an application to create a thread dump (or javacore). Here is the process for forcing a thread dump, which is different from the process in earlier releases of the product: Using the wsadmin command prompt, get a handle to the problem application server: wsadmin>set jvm [$AdminControl completeObjectName type=JVM,process=server1,*] Generate the thread dump: wsadmin>$AdminControl invoke $jvm dumpThreads.

Look for an output file in the installation root directory with a name like javacore.date.time.id.txt. Browse the thread dump for clues: If the JVM creates the thread dump as it closes (the thread dump is not manually forced), there might be "error" or "exception information" strings at the beginning of the file. These strings indicate the thread that caused the application server to die. The thread dump contains a snapshot of each thread in the process, starting in the section labeled "Full thread dump." Look for threads with a description that contains "state:R". Such threads are active and running when the dump is forced, or the process exited. Look for multiple threads in the same Java application code source location. Multiple threads from the same location might indicate a deadlock condition (multiple threads waiting on a monitor) or an infinite loop, and help identify the application code with the problem.

If these steps do not fix your problem, search to see if the problem is known and documented, using the methods identified in the available online support (hints and tips, technotes, and fixes) topic. If you find that your problem is not known, contact IBM support to report it.

No comments:

Post a Comment