Span duration of an HTTP GET calculated wrongly?

IngoStrauch2020 · May 16, 2024, 12:52pm

APM Server version: 8.9.1

APM Agent language and version: Java 1.49.0

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

While debugging a long running transaction we found that one of the methods we implemented ourselves took 50s while basically all it does is one HTTP GET call to a Keycloak REST endpoint returning all users know to the identity provider which according to the Timeline view only took 200ms:

ElasticAPM_HTTP_chunked

Technically we use the Keycloak.findAllUsers() method from org.keycloak:keycloak-admin-client:23.0.7 that internally uses org.jboss.resteasy:resteasy-client:6.2.4.Final for the actual HTTP requests.

Performing the same GET /admin/realms/test/users?max=6000 in Postman takes the full 50s to return all users, not 200ms as seen in the Timeline view above.

The response from Keycloak is streamed (transfer-encoding: chunked) and my assumption now is that only the first chunk is taken into account for the duration of the HTTP GET span in the screenshot above.

Am I right with this assumption?
And if yes, is there something to be done about it as it led to quite some confusion (we actually thought changing the code to see if a different Java HTTP client implementation would be faster).

Steps to reproduce:

trigger a REST endpoint in a Spring Boot application using the APM Java agent that calls Keycloak.findAllUsers() in a method annotated with @CaptureSpan and otherwise has no other logic in it (use a significantly large number of existing users in Keycloak; e.g. 2000)
check the corresponding transaction under Observability > APM
- the HTTP span should make up basically all of the annotated Java method but doesn't

Jonas_Kunz · May 16, 2024, 1:07pm

The duration of HTTP Client spans is the time to first byte.
This has several reasons:

From instrumentation perspective we have no guarantee whether the client will actually consume the response body: If it doesn't we would end up with a started but never stopped Span
For many HTTP-clients it is very difficult to find an instrumentation hook for when the response has been consumed
The duration until the response body has been fully consumed is actually also heavily dependent on the client/consumer: Because the data is streamed, the consumer is a big factor in the pace until the response has been fully consumed. E.g. if you issue DB calls (or do something else slowly) synchronously while streaming the response data you'd also end up with a duration which has nothing todo with the speed of the server or network.

If you are interested in the total time until the response has been fully received and processed, you need to do exactly what you suggested: manually instrument a method which finished after the response has been consumed.

IngoStrauch2020 · May 17, 2024, 7:28am

Thanks for the detailed answer, Jonas!

That makes a lot of sense and we were just off the track with our interpretation.

Topic		Replies	Views
Duration field in span stacktrace APM	2	427	April 23, 2019
Understand a span detail APM rum	2	487	October 14, 2019
APM Java agent not tracking spans APM java	7	841	May 18, 2021
Span duration missing resource loading time from Chrome devtools APM rum	3	567	August 30, 2023
All Spans duration do not add up to corresponding transaction duration APM	5	441	December 24, 2018

Span duration of an HTTP GET calculated wrongly?

Related topics