Java agent warns of improper deactivation of span

Good(?) news:

  • Using elastic.apm.disable_instrumentations=concurrent,executor partially fixes the issue.

Spans are not deactivated with delay anymore, the warning is gone, but I still have some
_render_ spans not being registered
.

Maybe those two are different issues?

  • some render spans are not being captured properly
  • concurrent,executor instrumentations are causing span references to be left around and manipulated on other threads/transactions

Something else I noticed, is that everytime the issue (delayed deactivation) happened, the first part of the internal id (I can see it when I .dump() the span/transaction object in groovy):

co.elastic.apm.api.SpanImpl@39eec638 span='SimpleGrailsController' 00-6e5df9c349ec25a7cb1f13e3f538def9-ca62deb08bf83a8a-01

which in this example is 6e5df9c349ec25a7cb1f13e3f538def9 (and corresponds the trace.id on the raw doc on elastic) is all zeroes, like in:

co.elastic.apm.api.SpanImpl@433b89cc span='' 00-00000000000000000000000000000000-f57c7c7a7f92ad90-00

and the span name is always an empty string too.

I can reproduce this specific pattern when I press F5 on the front-end service twice (without waiting for the first one to finish). Whenever I do this, a new issue comes into play: pretty much all future transactions with the same name (or some other similar characteristics) will be stuck on this ghost transaction and won't appear on elastic/kibana at all (they seem to be arriving at the apm-server though, based on the logs, but it could be just metrics).

/*around here I did a quick double-F5, and the second one got into ghost-mode*/
/*note that the transaction.id leaks from one to the other*/
[2019-03-15T21:29:59.856-0300] currentSpan=co.elastic.apm.api.SpanImpl@6d2c961d::<co.elastic.apm.api.SpanImpl@321e736f span='SimpleGrailsController' 00-8b67581e24992dc27b0e0471a5e5c5ca-2db20472856b83d8-01>
[2019-03-15T21:30:00.399-0300] currentSpan=co.elastic.apm.api.SpanImpl@2947804c::<co.elastic.apm.api.SpanImpl@76e0ee75 span='' 00-00000000000000000000000000000000-2db20472856b83d8-00>

/*each one of these logs is one F5 after it was triggered, 12/15 ghost ones with no trace.id, and a lot of duplicate/leaked transaction.ids */
[2019-03-15T21:30:04.004-0300] currentSpan=co.elastic.apm.api.SpanImpl@3de838ea::<co.elastic.apm.api.SpanImpl@3462bb14 span='SimpleGrailsController' 00-d13d0064f47715d97ab4ce737d71253e-d9c2c8baa610f97a-01>
[2019-03-15T21:30:09.480-0300] currentSpan=co.elastic.apm.api.SpanImpl@40555a5e::<co.elastic.apm.api.SpanImpl@493cf282 span='' 00-00000000000000000000000000000000-d9c2c8baa610f97a-00>
[2019-03-15T21:30:13.715-0300] currentSpan=co.elastic.apm.api.SpanImpl@61417412::<co.elastic.apm.api.SpanImpl@5216f548 span='' 00-00000000000000000000000000000000-d9c2c8baa610f97a-00>
[2019-03-15T21:30:16.085-0300] currentSpan=co.elastic.apm.api.SpanImpl@4a706651::<co.elastic.apm.api.SpanImpl@440f546f span='' 00-00000000000000000000000000000000-d9c2c8baa610f97a-00>
[2019-03-15T21:30:18.613-0300] currentSpan=co.elastic.apm.api.SpanImpl@2674300::<co.elastic.apm.api.SpanImpl@1f7425f2 span='' 00-00000000000000000000000000000000-d9c2c8baa610f97a-00>
[2019-03-15T21:30:20.714-0300] currentSpan=co.elastic.apm.api.SpanImpl@3e84a30c::<co.elastic.apm.api.SpanImpl@52b84049 span='' 00-00000000000000000000000000000000-d9c2c8baa610f97a-00>
[2019-03-15T21:30:22.301-0300] currentSpan=co.elastic.apm.api.SpanImpl@3fc080ca::<co.elastic.apm.api.SpanImpl@58e46a7 span='' 00-00000000000000000000000000000000-d9c2c8baa610f97a-00>
[2019-03-15T21:30:22.756-0300] currentSpan=co.elastic.apm.api.SpanImpl@36bb420b::<co.elastic.apm.api.SpanImpl@14e0dcba span='' 00-00000000000000000000000000000000-d9c2c8baa610f97a-00>
[2019-03-15T21:30:26.279-0300] currentSpan=co.elastic.apm.api.SpanImpl@60f0f959::<co.elastic.apm.api.SpanImpl@2caa4e27 span='' 00-00000000000000000000000000000000-d9c2c8baa610f97a-00>
[2019-03-15T21:30:26.438-0300] currentSpan=co.elastic.apm.api.SpanImpl@5a777ed5::<co.elastic.apm.api.SpanImpl@4f14e445 span='SimpleGrailsController' 00-a1da5e823e3be5b560fbe29d1e5c1a10-e2eccdd0f465fea5-01>
[2019-03-15T21:30:26.641-0300] currentSpan=co.elastic.apm.api.SpanImpl@234c8568::<co.elastic.apm.api.SpanImpl@30e9475e span='' 00-00000000000000000000000000000000-e2eccdd0f465fea5-00>
[2019-03-15T21:30:27.017-0300] currentSpan=co.elastic.apm.api.SpanImpl@126c400a::<co.elastic.apm.api.SpanImpl@760ff824 span='' 00-00000000000000000000000000000000-e2eccdd0f465fea5-00>
[2019-03-15T21:30:27.108-0300] currentSpan=co.elastic.apm.api.SpanImpl@1761cba2::<co.elastic.apm.api.SpanImpl@74b11af8 span='SimpleGrailsController' 00-21f701730133903e69d13316d8aac1e0-9af48d079a2699e1-01>
[2019-03-15T21:32:15.840-0300] currentSpan=co.elastic.apm.api.SpanImpl@28d01e71::<co.elastic.apm.api.SpanImpl@5dcea18c span='' 00-00000000000000000000000000000000-9af48d079a2699e1-00>
[2019-03-15T21:34:24.377-0300] currentSpan=co.elastic.apm.api.SpanImpl@79dffda9::<co.elastic.apm.api.SpanImpl@4573710 span='' 00-00000000000000000000000000000000-9af48d079a2699e1-00>

I have reproduced this behavior (ghost transactions stuck "forever") regardless of my custom Apache HTTP async, custom span creation or render instrumentation. Leaving only public-api and servlet-api (for the simplest shell of a transaction and custom spans) still gave me a messed up scenario from time to time.
I tried setting up log_level=TRACE and was able to see that a correct transaction has logs all over specifying when it starts/ends, but a ghost one doesn't (it only has span starts/ends).

Please let me know if there's anything else I can provide to help understand (and hopefully fix) this.