What contributes to APM Failure rate?


I would like to ask what is the failure rate in APM services stands for? I have read the documentation that suggest that the it is based on event.outcome / the status code / and the errors if there are no status code, and I tried to ingest pipeline set metadata the value of event.outcome=failure, transaction.result=HTTP 5xx, and transaction.result=failure. None of these worked to increase the failure rate of the transactions. So how is the failure rate actually defined? And is there any way to change a successful transaction to become a failed one (without using code editing API)?


The failure rate is calculated from the span.outcome or transaction.outcome, and for HTTP method calls the behavior depends on the side of the request:

  • the default outcome value depends if an error was triggered during invocation, but there are protocol-specific definitions, for example with HTTP
  • when receiving the HTTP request (on the server side), the transaction is a failure only if it's a 5xx
  • when emitting the HTTP request (on the client side), the span is a failure only if it's a 4xx or 5xx

The agents rely on the following specification for HTTP requests.

From what I understand you need to make some HTTP calls appear as failed ones, whereas they are currently captured as successes, is that correct ?

I see, and yes, there are some that were caught as 302 (redirect), but have "1 Error" tag. That also make me have another question, how to display this failure rate from the transaction page to kibana dashboard?

Here the 1 error tag that you mention might appear in the trace view.

The errors here can actually be just some exceptions that are thrown during the request processing.

Could you share a screenshot of how it looks like in UI ?

Also, could you elaborate a bit where you'd like to be able to display the error rate in UI ?

Hi, sorry for the late reply, yes, the "1 error" tag appears from the trace view. Currently I don't have access to the environment so I can't share any pictures yet. But what I remember is the event.outcome of the transaction is a success, and looking from the metadata of the transaction document, it does not contain any error attribute in it so I can't query it in any way as a failure. Also I would just like to display this error as part of the failure rate, either in a custom dashboard, or even better in the APM services/transaction failure rate graph.

Actually in your case it might be that an exception is thrown and captured by the agent in some part of your application (for example in a high-level framework like Spring MVC that might use an exception to indicate a missing resource), but this exception is caught elsewhere in the application in a low-level framwework like Servlets when it's mapped to the 404 HTTP status code.

When such thing happen, it means that the transaction might be captured as success, whereas there was an exception captured when the transaction was active (shown as an error in UI).

In order to deal with such cases, you could rely on the related error documents that have an transaction.id attribute instead of relying on the transactions that have an event.outcome = failure.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.