APM UI Showing gap between distributed transactions even when that much gap doesn't exist

Kibana version: Kibana 7.5.1

Elasticsearch version: 7.5.1

APM Server version: 7.8.1

APM Agent language and version: .NET (1.6.1)

Browser version: Chrome(Version 78.0.3904.97 (Official Build) (64-bit))

Original install method (e.g. download page, yum, deb, from source, etc.) and version: download page

Fresh install or upgraded from other version? Fresh Install

Is there anything special in your setup? For example, are you using the Logstash or Kafka outputs? Are you using a load balancer in front of the APM Servers? Have you changed index pattern, generated custom templates, changed agent configuration etc. Nothing Special

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

I've got two services publishing to APM and at multiple instances I could see the gap between DistributedTransactions which actually doesnot exists as i'm adding timestamps of Creating n Ending Transactions across services.

Please find attached snapshots.

btw, is there a way Agents can set the timestamps for each transactions ?

Hi @ImdotnetJunkie,

this is very strange - yeah, I see the values in the labels, which are different from what the UI shows.

Unfortunately all I can say based on what we have here is that this should not happen. Can you maybe give us more info?

  • Does this always happen - so with all your distributed traces -, or only in this specific combination?
  • How is the AgoraPublisher Latency transaction created in the MiddleTier? Is it automatically created or do you do it manually?

To your question:

btw, is there a way Agents can set the timestamps for each transactions ?

No, the timestamp is always automatically set to the timestamp when the transaction is created. So this field is not exposed and can't really be changed.

This is happening because these two services (FlowTrader n MiddleTier) running on different machines. And yes, there are lots of cases where I saw this discrepancy. 

These transactions are created when a certain event happens in the System and as you could see I'm tagging along the NTP based times in the labels e.g. CreatedOn n EndOn to identify when these Transactions n Spans are getting created n ended respectively.

Based on my further analysis, I could see the difference in timestamps of these two transactions:
1) FlowTrader - the difference between labels.Createdon n Timestamp is 1067 ms
2) MiddleTier - the difference between labels.Createdon n Timestamp is 382ms
(As per attached snapshots)

And that is the reason we could see this gap in APM UI.

Now my question is how would you make sure you have included Clock drift between these machine in case of distributed tracing as otherwise the UI would always show this gap which in actuality doesnot exist.

```![Order-New_Timestamp_Comp|690x179](upload://cKh3LlRVq2tQ3AJmE13wh6GJiuC.png) ![AgoraPublisher_Transaction_Timestamp_Comp|690x230](upload://bE6h2TnCDMKVX3ypbEMrgGBidD5.png)

@GregKalapos : Could you please provide an update on this?

Thanks for the info @ImdotnetJunkie,

Honestly I'm still unable to spot anything that could help us to find the problem.

Could you also give me the timestamps for the transactions? You can hover over the time and it'll be shown (see screenshot), or you can just find it in Discover.

Additionally, could you maybe recreate the problem by setting the agent log to Trace level? On that level we print the timestamp, so we could compare it with the real timestamp.

Now my question is how would you make sure you have included Clock drift between these machine in case of distributed tracing as otherwise the UI would always show this gap which in actuality doesnot exist.

The UI handles clock drift when it can detect it (e.g. child starts before the parent), but if it can't be detected then it won't do anything. Usually this is not a problem these days.

Please find attached snapshot's where you could see the timestamps for the transactions. Also as I mentioned this is what i noticed:

Based on my further analysis, I could see the difference in timestamps of these two transactions:

  1. FlowTrader - the difference between labels.Createdon n Timestamp is 1067 ms
  2. MiddleTier - the difference between labels.Createdon n Timestamp is 382ms
    (As per attached snapshots)

Why you said so: Usually this is not a problem these days. ???

@GregKalapos: Any update on this ?

@GregKalapos any update here ?

Hi Team, Could someone please provide an update on this ?

@ImdotnetJunkie,

we really try to help, but I personally have a very hard time to get an overview about the situation.

Why you said so: Usually this is not a problem these days. ???

There are protocols to take care of this.

Can you maybe give me a reproducer? Just a small dummy sample app where this happens on your system.

FlowTrader Service is running on a developer machine(Win 7) in London, MiddleTier Service is running on a WindowServer and APM Server is running on a different Window Server.

Now, the time you see in the metadata of above snapshots are provided by NTP server to avoid any delay in timestamps observed of these two services. The only thing which is not NTP synched is the time on machine where APMServer is running where there might be clock drift as reported.

You could try to run in this particular set up to see the problem I reported.

If I run both the services on my local machine I don't see any gap on APM UI.

@ImdotnetJunkie to me this sounds like a time issue in that particular setup.

To be honest I'm not sure we can help with this. You need to make sure that time is properly synced on machines where the agent captures data.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.