Improving analysis of gaps between spans

digitalron · October 21, 2019, 1:34pm

Hi All,

Elastic APM Stack 7.1.0 (APM Server, Logstash, Kibana, Elasticsearch), Java Agent 1.7.0
Application: Spring-Boot with JDBC

After going live for several weeks with Elastic APM and getting lots of good insights, one of our recurring issues is is having a better visibility of the gaps in between spans, like in the graph below:

The gap between the two JDBC calls cannot be determined from the Kibana tools, and there is not enough info to say what caused that gap. Our investigation on other tools showed it had to do with running out of connection pools and the request needed to wait until one was available before continuing. The application had two consecutive JDBC calls inside a method.

Is there something we could have done to improve visibility of that gap? Or do we need to explicitly code for similar scenarios (not clear for us how to embed a span manually in the middle of an auto-instrumented class though)?

Any other advice?

Thanks!

felixbarny · October 21, 2019, 2:52pm

Hi and thanks for the excellent question!

For this specific issue, it probably makes sense that we instrument javax.sql.DataSource#getConnection() (I assume this is the method which took long in your case).

Which tools did you use and how did you find out about that.

In case you got that from JMX metrics, you can now include those via the capture_jmx_metrics config option.
We currently don't offer that but we could overlay metrics in the transactions view to make it easier to correlate the captured metrics with traces.

You basically have three options here:

Programmatically: Get the current span (possibly created by auto instrumentation) and create a child.
Advantage: most flexible way, you can add custom labels to the span
Declaratively: Annotate an arbitrary method with @CaptureSpan.
Advantage: Easier, more robust (there's nothing you can do wrong like forgetting to end a span or close a scope) and more performant than the programmatic way
Via configuration: Use trace_methods to specify additional methods to instrument.
Advantage: you don't need to modify the source code

digitalron · October 21, 2019, 4:30pm

Hi Felix,

Thanks for these insightful answers. These have set us on a new direction to start trying the approaches you've outlined. That list actually gave me an idea to try on a different -- although similar -- case involving some OS Netflix components we used (Feign/Ribbon).

I have to get back to get back to you on how the team found out about the connection pool issue causing that wait time between JDBC calls. I do know we have a Prometheus/Grafana stack for metrics (been there for years) so its possible that there was a JMX exporter involved. I'll have to confirm with them though.

The scenario was that we suddenly had a spike of around 55K tpm for a bit of time so I'm not surprised that overwhelmed our available connections.

Best regards,
Ronald

system · November 11, 2019, 12:30pm

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What causes gaps in span APM java	2	1036	February 17, 2020
APM Java agent not tracking spans APM java	7	845	May 18, 2021
Missing data in APM Dashboards in Kibana APM java , server , ui	15	2877	March 31, 2020
Java APM has no spans APM	7	643	January 24, 2019
Elastic APM using Java agent : Only a subset of my incoming requests have spans, while the majority of requests do not APM java	10	322	October 27, 2023

Improving analysis of gaps between spans

Related topics