Slow and hung request detection

When an application takes longer than expected to complete a request, the request might be logged as slow or hung. The Request Timing feature automatically detects slow and hung requests and provides detailed information to help you find the root cause of the delay.

Enable the Request Timing feature to identify incoming requests that run longer than a slow request threshold. The Request Timing feature documents slow and hung requests as they occur, so you don’t need to re-create the problem to collect more data. In addition to the global settings for slow and hung request thresholds, you can set thresholds for specific servlets or JDBC requests. For more information and configuration examples, see the Request Timing feature.

Slow request detection

A request might be slow because its code path is long. Slow requests can occur when a resource that the request depends on is busy. To diagnose a slow request, you need to know what actions occurred during the processing of that request and the processing stage of the request.

When a request runs longer than the slow request threshold, the Request Timing feature generates a warning message in the messages log file. The message includes details about the request and the events that made up the request. The default threshold value is 10 seconds.

The following example shows the format of a log message:

TRAS0112W: Request <(Request ID)> has been running on thread <THREADID> for at least <DURATION>. The following stack trace shows that this thread is currently running.
  • You can use the Request ID value to search for log and trace messages that correspond to the request. If you use binary logging, you can search for log and trace entries with the same requestID extension by using the binaryLog command.

  • The STACK TRACE indicates that the method is running.

  • After the STACK TRACE, the DURATION AND OPERATIONS Table provides the format of the request that includes the duration and operation. The Duration column indicates the time that is taken by the corresponding operation of the request. The plus sign (+) that is shown in the Duration column indicates events within the request that are still running. The Duration column also includes duration without the plus sign to indicate that the corresponding operation finished in the specified duration. The Operation column shows the EVENT TYPE and optional CONTEXT INFO for the specified operation.

The message helps you analyze the reason behind the slow request, though at this point you might not be able to determine whether the request is hung or is still running slowly. To address this issue, three messages are displayed for any slow request at the interval that is specified by the slowRequestThreshold attribute. The stack trace data gives the information about the method that is running, while the request data gives details about the request, like the request ID and duration. The three different stack trace and request messages help you better understand where the request is going wrong by looking into what the thread is running currently. No further warnings are logged about the request unless the duration of the request crosses the hung request detection threshold.

Hung request detection

Requests that take longer than the hung request threshold are typically blocked by lack of access to a shared resource. For example, the problem might result from a section of code that allows only serial access and transfers information sequentially. Such longer requests are likely to be hung, and can be blocked for a long time or even indefinitely. One particular form of hung request, called a deadlock, occurs when two threads are blocked waiting on each other to return shared resources. For a hung request, you need to know where all the threads are at in their processing, and which locks the threads are waiting on.

If the request exceeds the hung request threshold, the Request Timing feature generates a warning message in the messages log file with the request details and events that make up the request. A series of three Java cores is recorded with a 1-minute delay among the cores. Further Java cores are created if new requests are detected.

The following log message sample shows the log messages for a request that crosses the hungRequestThreshold attribute. The default duration value is 10 min. The value that is configured in the following example is 4 min.

[WARNING ] TRAS0114W: Request AAA7WlpP7l7_AAAAAAAAAAA was running on thread 00000021 for at least 240001.015ms. The following table shows the events that have run during this request.
Duration       Operation
240001.754ms + websphere.servlet.service | TestWebApp | TestServlet?sleepTime=480000
     0.095ms       websphere.session.setAttribute | mCzBMyzMvAEnjMJJx9zQYIw | userID
     0.007ms       websphere.session.setAttribute | mCzBMyzMvAEnjMJJx9zQYIw | visitCount