Span data visualisation missing in kibana

,

Kibana version:
7.5.0

Elasticsearch version:
7.5.0

APM Server version:
7.5.0

APM Agent language and version:
Java 1.13.0

Browser version:
Safari Version 13.0.5 (14608.5.12)
Edge Version 80.0.361.62 (Official build) (64-bit)

Original install method (e.g. download page, yum, deb, from source, etc.) and version:
Yum

Fresh install or upgraded from other version?
Upgraded from 7.3.2 quite a while ago

Is there anything special in your setup?
We use an ALB in front of our apm-server instances, this has been working without issue until recently.

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

We're having a problem which seems to be similar to No lines drawn in "Time spent by span time", although data is available.

No elastic-apm span data is being displayed in kibana although data is coming in. There is no indication why span information isn't being displayed. All other visualisations in apm seem ok.

We've verified the apm-7.5.0-span-* indexes are healthy and being filled, and that there is a writeable index.

We've identified by using /api/apm/services/{service_name}/transaction_groups/breakdown recent time ranges do not return any results, i.e.:

{"kpis":[],"timeseries":[]}

We manually created a visualisation to prove data exists for the time range which doesn't show data on the transaction_groups breakdown.

Screenshots demonstrating different behaviour depending on time range specified


Steps to reproduce:

  1. Go to APM services. app/apm#/services?rangeFrom=now-1h&rangeTo=now
  2. Select a service. app/apm#/services/{service_name}/transactions?rangeTo=now&refreshPaused=true&refreshInterval=0&rangeFrom=now-1h
  3. Set date range to 7 days. app/apm#/services/{service_name}/transactions?rangeTo=now&refreshPaused=true&refreshInterval=0&transactionType=request&environment=&rangeFrom=now-7d

Errors in browser console (if relevant):
No relevant browser console errors when browsing

Provide logs and/or server output (if relevant):
No relevant error logs in apm-server or kibana logs

Hi and welcome to the forum :slightly_smiling_face:

A couple of questions:

Do you see breakdown metrics in your index? See also https://www.elastic.co/guide/en/apm/agent/java/current/metrics.html#metrics-application

Are there errors in the console?

Does the time when there was a gap in the transaction duration graph correlate with the time after which the "Time spent by span type" graph disappeared? What happened at this time? Did you deploy a new version with a new Java agent version?

Are you saying this only happens when selecting a large time frame?

Hi,

We searched the last 15 minutes in the apm-* index with:

processor.event: "metric" and service.name: "bookings-bookings" and transaction.duration:* and transaction.breakdown.count:*

All results have transaction.duration and transaction.breakdown.count.

e.g.

transaction.breakdown.count	1
transaction.duration.count	1
transaction.duration.sum.us	170,162
transaction.name	CustomerCareBOWebService#bookingCancellationQuote
transaction.type	request

There are some errors but they seem to appear even when the apm span data is displayed correctly.

Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'unsafe-eval' 'self'". Either the 'unsafe-inline' keyword, a hash ('sha256-SbBSU7MfZFnVMq4PuE/jbBz7pPIfXUTYDrdHl7Ckchc='), or a nonce ('nonce-...') is required to enable inline execution.
 /bundles/app/apm/bootstrap.js:9 ^ A single error about an inline script not firing due to content security policy is expected!
 /built_assets/dlls/vendors.bundle.dll.js:559 INFO: 2020-03-10T22:14:41Z
Adding connection to https://kibana.prod.viatorsystems.com/elasticsearch
 /api/ml/anomaly_detectors/demand-contentapi-request-high_mean_response_time:1 Failed to load resource: the server responded with a status of 403 (Forbidden)
 4/api/ml/anomaly_detectors/bookings-bookings-request-high_mean_response_time:1 Failed to load resource: the server responded with a status of 403 (Forbidden)
 DevTools failed to parse SourceMap: chrome-extension://gighmmpiobklfepjocnamgkkbiglidom/include.preload.js.map
 DevTools failed to parse SourceMap: chrome-extension://hdokiejnpimakedhajhdlcegeplioahd/sourcemaps/onloadwff.js.map
 DevTools failed to parse SourceMap: chrome-extension://gighmmpiobklfepjocnamgkkbiglidom/include.postload.js.map

Our cluster was down during that period. The issue we're seeing with displaying the apm span seems to correlate with the day before our outage. I think you might be on to something there, would data corruption cause this behaviour, how would I go about resolving that?

Sorry for the confusion. 7 days ago is when the data stopped appearing, so for example if we select 7 days, we see the tail of when the data is available before cutting off. If we select a more recent range, like 1 hour there is no span visualisation even though we can see metrics coming in.

Below I provide a few examples of no span data returning for various periods, despite the fact we believe apm span metrics are flowing:

$ export start="2020-03-09T00%3A00%3A00.000Z"
$ export end="2020-03-10T01%3A00%3A00.000Z"
$ curl -XGET "http://10.40.68.86:5601/s/gpowell/api/apm/services/bookings-bookings/transaction_groups/breakdown?start=${start}&end=${end}&transactionType=request&uiFilters=%7B%22environment%22%3A%22prod%22%7D"
{"kpis":[],"timeseries":[]}

$ export start="2020-03-04T00%3A00%3A00.000Z"
$ export end="2020-03-04T01%3A00%3A00.000Z"
$ curl -XGET "http://10.40.68.86:5601/s/gpowell/api/apm/services/bookings-bookings/transaction_groups/breakdown?start=${start}&end=${end}&transactionType=request&uiFilters=%7B%22environment%22%3A%22prod%22%7D"
{"kpis":[{"name":"app","percentage":0.09371777854399276,"color":"#00b3a4"},{"name":"http","percentage":0.4967009362537613,"color":"#3185fc"},{"name":"mysql","percentage":0.4095812852022459,"color":"#db1374"}],"timeseries":[{"title":"app","color":"#00b3a4","type":"areaStacked","data":[{"x":1583280000000,"y":0.14473524164335685},{"x":1583280030000,"y":0.12784874525522436},{"x":1583280060000,"y":0.06674221129087728},{"x":1583280090000,"y":0.06882793104766011},   

[...]

{"x":1583283600000,"y":null}],"hideLegend":true}]}

Thanks for your help on this

Geoff

Can you spot any errors in the Kibana Server logs related to this?

We've identified by using /api/apm/services/{service_name}/transaction_groups/breakdown recent time ranges do not return any results, i.e.:

Interesting. Can you try appending ?_debug=true to that request like (leave all other query params untouched):

/api/apm/services/{service_name}/transaction_groups/breakdown?_debug=true&start=...

This will log the elasticsearch query to your Kibana logs. You should then be able to run the query in Kibana Dev Tools.

I'm interested to hear if the ES query returns any results.

We finally got to the bottom of this today so span data is being displayed. The field data on span.type in the metric index was turned off, not allowing it to load into memory + display graphs.

To debug this, we did the following:

  1. Appended ?_debug=true to the request (leave all other query params untouched) - this will log the elasticsearch query to your Kibana logs:
    /api/apm/services/{service_name}/transaction_groups/breakdown?_debug=true&start=...
  2. Run the query in Kibana Dev Tools app/kibana#/dev_tools/console?_g=() - this will return the error that is stopping the span data from being displayed

In our case we had to delete the metric index and recreate it as it didn't seem to be rolling over. Running the query manually gave us the index error (pointing to metric not span), some field had field data disabled.

Thanks,

Geoff

Great to hear you got it solved. We'll look into making this debug info more readily available.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.