APM .net response time mismatch?

Hi,

Kibana version:
7.10.2

Elasticsearch version:
7.10.2

APM Server version:
7.10.2

APM Agent language and version:

<TargetFramework>net5.0</TargetFramework>
<PackageReference Include="Elastic.Apm.NetCoreAll" Version="1.11.0" />

Browser version:
Chrome 91.0.4472

Original install method (e.g. download page, yum, deb, from source, etc.) and version:
Elk stack installed from Helm charts (K8S in AKS)

Fresh install or upgraded from other version?
Frest install

Is there anything special in your setup?
No

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

Steps to reproduce:

  1. I note systematic differences in response time between APM and external clients such as Postman, on the following example, Postman reports 183ms while APM tells 76ms, who is right and how can we explain the difference ?

Is it about json payload download that is not computed in APM (the payload in my API is about 19.5KB) ?

Postman

APM

Here are some typical response time (in ms)
APM | Postman
76 | 183
108 | 211
75 | 231
73 | 160
82 | 103

  1. Second point is about the whole response time exposed in APM, it doesn't fit on the sum list of all actions involved, as for example the trigger of the following route (controller) tells 82ms, but when I'm summing the displayed response time of inner spans, I get 63,1ms and the top bar seems to show extra time, what is about ?

Errors in browser console (if relevant):
No error

Provide logs and/or server output (if relevant):

Hi @Nicolas_Rey,

your data seems to be ok to me. Let me try to explain.

To the 1. point:

The .NET APM Agent measures the response time between 1) the time the HTTP request hits the ASP.NET Core pipeline and 2) the pipeline returns an HTTP response back to the caller. You can imagine the .NET APM agent as an ASP.NET Core middleware which is registered as the very first middleware in the pipeline (we have 2 setups in our code base, one is actually a real middleware). Now, this naturally won't measure any network communication or serialization from the client which calls the .NET 5 application you monitor with the .NET Agent. Postman is different, since it will also include the network communication into the measurement. Postman measures the time between 1) the point when it sends the HTTP request and 2) it receives the HTTP response (I don't know if it also includes any serialization, I'm not super familiar with the inner details of Postman, but that won't change much).

So Postman measures response time from a client's point of view (your application + infrastructure), the .NET APM Agent measures the response time spent purely in the server side within the .NET code (which is your application).

Now, we also have a RUM Agent which runs in the browser. So if your application is some type of HTML/JS application, then additionally to the .NET Agent you can also enable the RUM Agent (details in the linked docs) and with that you'll have client side data including response time, measured from the client's (in this case browser) point of view. So that'll give you a more detailed measurement showing both the response time from the client and also from your .NET app. In that case I'd expect the RUM agent to show response times matching what Postman shows.

To point 2:

In that case the transaction itself is 82ms, but the sum of the spans within the transaction is only 63,1ms (I did not count it myself, I just took your numbers, but I trust your calculation :slight_smile: )

This is also normal. Spans within the transaction represent specific operations, but it's not like you always have to have an active span throughout the whole transaction.

This is a good documentation on what is a trace/transaction/span.

The transaction starts when the request hits the ASP.NET Core pipeline and finishes when a response is sent back. This is 82 ms. Within that you have your spans, so you see when specific things happen. From what I see on the screenshot the last span is a call to Elasticsearch and then there is a gap between the end of the Elasticsearch call and the end of the transaction. This makes sense to me - when your Elasticsearch call finishes, .NET won't immediately return the response (by "immediately" I mean here literally within 1ms). Typically the pipeline sets things on the HTTTP response and things like that. So within that gap that is likely what's happening.

I hope this help.

Regards,
Greg

Hi @GregKalapos

Again thank you for your time, and for the really detailed response, as usual it's awesome.
I (and I'm sure others too) really appreciate the effort put from the .Net/Elk team on your products :pray:

For your response:

  1. It absolutely makes sense, and I supposed it was the main explanation, but I'm glad to have your confirmation

  2. I understand, and I thought about another possible case : if an action not contained within a span (and not automatically monitored contrary to Elk calls which are), we could spend some time within in memory computation or whatever, it won't be individually exposed in a span in the APM reporting, but the time spent is added in the main transaction which can lead to the same kind of display : is it a possible case or am I misunderstand something ?

  1. I understand, and I thought about another possible case : if an action not contained within a span (and not automatically monitored contrary to Elk calls which are), we could spend some time within in memory computation or whatever, it won't be individually exposed in a span in the APM reporting, but the time spent is added in the main transaction which can lead to the same kind of display : is it a possible case or am I misunderstand something ?

This is a totally possible case and yes, that'd be the same scenario. So in that case you won't have a span, but the duration of such calculation will be of course included in the transaction duration.

Btw. our Java APM agent has a feature addressing it, which creates spans automatically for longer operations out of the box - docs here. This is not yet possible in .NET.

1 Like

Thanks again, it all makes sense

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.