Setting start timestamp on Transaction to an earlier time is always added to "app" type

Kibana version: 7.4

Elasticsearch version: 7.4

APM Server version: 7.4

APM Agent language and version: Java Programmatic API (ElasticApmAttacher) 1.10.0

Description of the problem including expected versus actual behavior:

I am responding to calls from a third-party API and would like to start my transaction at the time the information was sent from the third-party (so I can track latency across deployed sites/regions).

What's happening is that when I setStartTimestamp to be earlier than the default transaction start, that whole time is added to the "app" type in the "Time spent by span type", even when there are spans in that time frame.

A simplified timeline ("absolute" ms):

  1. 0ms (Timestamped) message is sent from third-party
  2. 20ms Message is received by my app and starts processing
  3. 30ms A service is called
  4. 40ms service responds and returns

What I expect in "Time spent by span type":
50% Message Latency, 25% app, 25% service

What is actually reported:
33% Message Latency, 50% app, 17% service

This seems to be because when the transaction is moved 20ms earlier (making it 40ms instead of the local 20ms), that's automatically added to the "app" type, even though the full 20ms span is added in that time frame.
The times that I think it's calculating:
20ms Message Latency
30ms app
10ms service

The "app" should be 10ms.

Basic code:

long now = <get micros>;
long msgTs = <get micros from message>;
Transaction trans = createTransaction.setStartTimestamp(msgTs);
Span msgLatency = trans.createSpan().setStartTimestamp(msgTs).end(now);
// some work for 10ms
Span serviceTime = trans.createSpan();
try{
   return service.doStuff();
} finally {
   serviceTime.end();
}

"Workaround":

Set the latency span's times as in the code above, but do not call tran.setStartTimestamp. This fixes the percentages and the graph, with the annoyance that the message latency is no longer visible in the Trace view, since the transaction does not cover that time period.

Hi Stephen, welcome to the forum!

Thanks for reporting. The way we start and stop our timers is not compatible with this way of setting arbitrary timestamps. I opened this PR with a proposed fix. Still not sure whether we want to handle it this way, but if you can download and test the snapshot build and see whether this gets you what you would expect, it would be very useful feedback for us.

Thanks,
Eyal.

Fantastic! That's much more sensible.

However, it now highlights/confirms that I have no idea of how the % calculations work. The trace is almost entirely linear (very small overlaps), and yet most %'s are inflated (one type is reduced, but it has 2 disconnected spans)...

Is there a specification page for the calculations, or should I just look at the code?

(At this stage, I've just done a quick drop in without looking at the code - I used the elastic-agent-attach-1.10.1-SNAPSHOT.jar)

Thanks for testing.
There is this documentation page, but not really about the internals of measurements and calculations. Things are mostly straightforward if everything in the trace is synced, however they tend to get messier when having asynchronous spans.
Take a look at the unit tests- they have nice graphics attached to each, which might help.
If you still can't make sense of what you get, please provide specific numbers (durations of spans and the breakdown result).
I hope this helps..

The tests look entirely sensible. I'll have a look at the code to see what's happening, but what I'm seeing is:

Trace sample
Transaction: 946ms

comprising of (in order):
Message Latency: 365ms / 38.5% (in span details)
"app" gap - this is the only blank in the transaction
Processing: 175 / 18.5%
Logic: 234 / 24.7%
Processing: 121 / 12.8 which have 2 sub-spans: (4ms / 0.4% and 9ms / 0.9%)

So all of these trace statistics line up perfectly.

But in the "Time spent by span type" for the transaction, it says
app: 6.1%
Message Latency: 44.6%
Logic: 27.9%
Processing: 20.9%

This does not make sense to me

... but perhaps even though I'm in a transaction, perhaps it's still reporting the "time spent" as an aggregate value at that particular time instead of just for that transaction?

Note that "Time spent by span type" is not for a single transaction, it is for all the transactions captured during the time span your view is filtered on. It refers to the chart that is collapsed by default in this same widget.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.