Kibana Apm not showing Spans which are visible in Discover(too many spans?)

Kibana version: 6.5.4

Elasticsearch version: 6.5.4

APM Server version: 6.5.4

APM Agent language and version: java 1.3.0
APM Agent configuration:

active=true
instrument=true
environment=[redacted]
service_name=[redacted]
service_version=1.2
transaction_sample_rate=1.0
transaction_max_spans=1000
disable_instrumentations=
capture_body=all
capture_headers=true
log_level=INFO
enable_log_correlation=true
secret_token=[redacted]
server_urls=[redacted]
verify_server_cert=true
flush_interval=1s
max_queue_size=1024
application_packages=[redacted]
stack_trace_limit=50
trace_methods=

Browser version: Chrome 72

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):
We are using APM Java Agent to monitor performance of our Tomcat based applications. This works fine for most usecases but in one application I instrumented the startup by annotating the init-method of the central class and Kibana does not show the Spans - only the transaction.

When I use "View transaction in discover" however, if I click on the "View Spans" link of the transaction.id I see a lot of Spans. Some Spans are from the annotation @CaptureSpan but the majority are Spans created from SQLs.

What is confusing to me: I have set transaction_max_spans to 1000 but Discovery shows me 12k of Spans. Can this be the root cause of Kibana not showing any Spans?

@Wolfram_Haussig sorry you're having issues! You said that "this works fine for most usecases" -- in the other cases where spans are showing up in the APM UI as expected, do you see a similar excess in Discover?

Can you also post as much as you can from the Discover query where you are seeing your 12K spans? The APM UI shouldn't stop showing spans just because you have "too many".

Last: it would be very helpful for us to debug if you could open your developer tools while loading the APM page where you are seeing missing spans, find the "Network" panel or whichever panel shows the requests being made, search through the request for '/traces' or '/transactions' and paste the response you receive for the request to '/traces/{id}' or '/transactions/{id}', with any sensitive data removed/redacted, if that's possible for you.

Thanks,

Jason

Hello @rhodesjason,

Thank you for your response.

The largest working example I found had over 600 Spans but that is okay - it is still under our limit of 1000. I have not found a working example similar to this one.

Here is a screenshot of the dashboard showing all spans:

Your last tip was very helpful! I can see a call being made to https://host:5601/api/apm/traces/86094778933b859d678c460dc2678285?start=2019-03-11T06%3A21%3A03.197Z&end=2019-03-12T06%3A2 which returns 1000 as expected while the console does not show any javascript errors:
com_5601_app_apm
So it seems the spans are available but are not being shown in Kibana.

It is hard to redact all sensitive data but from what I see is:
The spans have a parent field but the file does not contain a span with this ID:

{
		"@timestamp": "2019-03-11T07:25:31.005Z",
		"timestamp": {
			"us": 1552289131005325
		},
		"beat": {
			"name": "[redacted]",
			"hostname": "[redacted]",
			"version": "6.5.4"
		},
		"processor": {
			"name": "transaction",
			"event": "span"
		},
		"span": {
			"parent": -7950759144807201000,
			"hex_id": "52bd174b8a47695f",
			"name": "SELECT",
			"type": "db.oracle.query",
			"duration": {
				"us": 698
			},
			"id": -3261424941936449000
		},
		"transaction": {
			"id": "9a68528c81c2ec54"
		},
		"trace": {
			"id": "86094778933b859d678c460dc2678285"
		},
		"context": {
			"service": {
				"name": "[redacted]",
				"agent": {
					"version": "1.3.0",
					"name": "java"
				}
			},
			"db": {
				"statement": "[redacted]",
				"type": "sql",
				"user": "[redacted]"
			}
		},
		"parent": {
			"id": "11a93aaec71409a0"
		},
		"host": {
			"name": "[redacted]"
		}
	}

For me it seems that Kibana selects only the first 1000 spans and the parents are not included in the response and therefore no parent child can be visualized.

Why does the setting transaction_max_spans in the agent not restrict the amount of spans being generated?
Is it possible to force Kibana loading all spans to see if the visualization would work then?

Best regards
Wolfram

I instrumented the startup by annotating the init-method of the central class

I am not sure I understand this correctly, can you add a sample code (or the real code) to demonstrate? If you only start a transaction on the Main method and never end it- this is a problem and the expected result would be spans continuously being sent to the APM server, but cannot be shown because their parent transaction is never ended and never sent.

I have set transaction_max_spans to 1000 but Discovery shows me 12k of Spans

This indeed shouldn't be the case, are you sure these are 12K for a single transaction?

We have Applications based on Spring so we subclassed the XmlWebApplicationContext to instrument the startup:
@Slf4j
public class StartupAwareApplicationContext extends XmlWebApplicationContext {

   @Override
   @CaptureTransaction(type="Lifecycle", value="close")
   public void close() {
      log.info("ApplicationContext.close called");
      super.close();
   }
   @Override
   @CaptureTransaction(type="Lifecycle", value="start")
   public void start() {
      log.info("ApplicationContext.start called");
      super.start();
   }
   @Override
   @CaptureTransaction(type="Lifecycle", value="stop")
   public void stop() {
      log.info("ApplicationContext.stop called");
      super.stop();
   }
   @Override
   @CaptureTransaction(type="Lifecycle", value="refresh")
   public void refresh() throws BeansException, IllegalStateException {
      log.info("ApplicationContext.refresh called");
      super.refresh();
   }
   
}

This is the transaction scope for the initialization of all beans. There are mutiple beans with init-Methods which have @CaptureSpan annotations and a lot of SQLs which are called.

According to Kibana(see screenshot in my last message) there are indeed over 12k of Spans...

Thanks for the info!

If these lifecycle events invoke async processes and then terminate, this may explain why we report more spans than configured and also why we may have problem showing them properly. However, since you are using 1.3.0, I don't know how this may be.

For further analysis, please run the application with log level set to TRACE and upload the log, but please make sure it contains the application startup, or at least the creation of the transaction we are interested at (by its ID).

Thanks,
Eyal.

The logfile is a bit... huge :smiley: It is nearly 200MB large. Do you have anything in mind which I can cut from it so I do not have to upload the complete 200MB?

:grinning:
If possible, look for a transaction ID that reproduces this behaviour and search for its first occurrence. Cut a few lines before and as much as makes sense after (I'd say up to ~20MB should be ok) and upload.
Just let me know which transaction ID is the one we are investigating.

Thanks!

Hi @Eyal_Koren ,

If it is still too large tell me and I will try to make it smaller but I didn't want to remove something that might be relevant to you. The transaction is fac40907f6bef9cc and starts in line 90. I have removed a lot of lines(mostly the stacktraces) and marked that with "...". The file is now 33MB

The file I have zipped and uploaded here: https://www.file-upload.net/download-13535372/elastic-apm.zip.html

Best regards
Wolfram

@Wolfram_Haussig Thanks a lot for preparing the log, it was very useful!

I did a small fix that I hope would enable the actual limitation of span creation, please try out this snapshot and let me know if it works. It still won't cover async scenarios, but from the log it seems that it is not the case here.

We opened a separate issue to address the UI side of this.

Thanks,
Eyal.

@Eyal_Koren Thanks for the snapshot - it is working fine now!

With the snapshot from you i see only 1000 spans in Kibana and the APM view now correctly displays them.

Two minor questions come to my mind here:
Will the agent stop collecting spans after 1000 or will he only collect the most time expensive?

With your snapshot i got a new transaction type called "custom" which seems to contain duplicates of a custom transaction type. I have the following code:

   Transaction transaction = ElasticApm.startTransaction();
    transaction.setType("Job");
    transaction.setName(getProcessorType());
    try(final Scope scope = transaction.activate()) {
				...
    }finally {
       transaction.end();
    }

Do I have to modify my code to be compatible with version 1.4?

Best regards
Wolfram

That's great to hear!

Will the agent stop collecting spans after 1000 or will he only collect the most time expensive?

It will stop after 1000, otherwise it defeats the purpose of not collecting too much to avoid too much of an overhead.
Once the UI issue I referred above will be addressed, you would be able to opt to see all spans.

Do I have to modify my code to be compatible with version 1.4?

We did make some changes around the type setting in 1.4.0, but we made efforts it will not require API adaptations. So did Job transactions stopped showing and instead you get custom transactions now?

It is difficult to say but I still get the same transaction name in both transaction types: custom and Job.

What I could see on first glance: The results of setResult and addTag are only visible on the custom transactions. I add the code here again including the setResult and addTag:

Transaction transaction = ElasticApm.startTransaction();
transaction.setType("Job");
transaction.setName(getProcessorType());
try(final Scope scope = transaction.activate()) {
   jobID = ...;
   if(jobID>=0) {
      transaction.setResult("WORK");
      transaction.addTag("JOB_ID", Long.toString(jobID));
   }else {
      transaction.setResult("NO_WORK");
   }
   ...
}finally {
   transaction.end();
}

Wolfram,

Unfortunately I can't reproduce this problem...
I am testing your sample code and also the annotation API with agent API jar 1.3.0 and with newest agent build and it works as expected. Please try the followings:

  1. See if you can identify whether an expected transaction is missing, and whether you can somehow identify which code your "custom" transactions are related to
  2. Try upgrading the agent API jar to version 1.4.0 and see if there's any difference

In general, transactions with type custom are transactions for which type was not set at all. Try to see if there is anywhere your code is creating transactions without setting a type.

Sorry for not being very helpful, hopefully you can provide some additional info to work with.
Eyal.

My bad, cleaning the workspace can be a wonderful thing.

Thanks for all the help!

:smile:

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.