Slow and hung request detection

When an application takes longer than expected to complete a request, the request can be logged as slow or hung. The Request Timing feature automatically detects slow and hung requests and provides detailed information to help you find the root cause of the delay.

The Request Timing feature identifies requests that run longer than a slow request threshold. The Request Timing feature documents slow and hung requests as they occur, so you don’t need to re-create the problem to collect more data. In addition to the global settings for slow and hung request thresholds, you can set thresholds for specific servlets or JDBC requests. For more information, see the Request Timing feature.

Slow request detection

A request can be slow because its code path length is long. Slow requests can also be caused if a resource, which the request depends on, is busy. Users need to know what actions occurred during the processing of that request, and the processing stage of the request.

When a request runs for longer than the slow request threshold, the Request Timing feature generates a warning message in the messages log file. The message includes details about the request and events that made up the request. The default threshold value is 10 seconds.

The following example shows the format of a log messages.

TRAS0112W: Request <(Request ID)> has been running on thread <THREADID> for at least <DURATION>. The following stack trace shows that this thread is currently running.

The message helps you analyze the reason behind the slow request, though it is difficult to determine whether the request is hung at that point or is still running slowly. To address this issue, three messages are displayed for any slow request at the interval specified by the slowRequestThreshold attribute. The stack trace data gives the information about the method that is running, while the request data gives details about the request, like the request ID and duration. The three different stack trace and request data help you better understand where the request is going wrong by looking into what the thread is running currently. No further warnings are logged about the request unless the duration of the request crosses the hung request detection threshold.

Hung request detection

Requests that take longer than the hung request threshold are typically blocked by lack of access to a shared resource, for example, a section of code that allows only serial access and transfers information sequentially. Such longer requests are likely to be hung, and can be blocked for a long time or even indefinitely. One particular form of hung request, called a deadlock, occurs when two threads are blocked waiting on each other to return shared resources. For a hung request, users need to know where all the threads are at in their processing, and which locks the threads are waiting on.

If the request exceeds the hung request threshold, the Request Timing feature generates a warning message in the messages log file with the request details and events that make up the request. A series of three Java cores is recorded with a 1-minute delay among the cores. Further Java cores are created if new requests are detected.

The following log message sample shows the log messages for a request that crosses the hungRequestThreshold attribute. The default duration value is 10 min. The value that is configured in the following example is 4 min.

[WARNING ] TRAS0114W: Request AAA7WlpP7l7_AAAAAAAAAAA was running on thread 00000021 for at least 240001.015ms. The following table shows the events that have run during this request.
Duration       Operation
240001.754ms + websphere.servlet.service | TestWebApp | TestServlet?sleepTime=480000
     0.095ms       websphere.session.setAttribute | mCzBMyzMvAEnjMJJx9zQYIw | userID
     0.007ms       websphere.session.setAttribute | mCzBMyzMvAEnjMJJx9zQYIw | visitCount