Kibana/Elasticsearch/APMServer: 6.6.1
APM Agent: Java@1.4.0
Install/setup: javaagent loaded via VMOptions:
-javaagent:/home/<>/elastic-apm-agent-1.4.0.jar
-Delastic.apm.service_name=front-end
-Delastic.apm.server_url=http://localhost:8200
-Delastic.apm.application_packages=br.com
compile 'co.elastic.apm:apm-agent-api:1.4.0'
Fresh or upgraded? Fresh
**Anything special?**Nop, all defaults.
[Bonus point] OS info
Linux 4.4.0-142-generic #168-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
First possible issue
The applications run without docker, and I get these INFO logs when I run anything:
INFO ApmServerHealthChecker - Elastic APM server is available: {"ok":{"build_date":"...","build_sha":"...","version":"6.6.1"}}
INFO SystemInfo - Could not parse container ID from '/proc/self/cgroup' line: 11:pids:/user.slice/user-1000.slice
INFO SystemInfo - || '/proc/self/cgroup' line: 10:blkio:/user.slice
INFO SystemInfo - || '/proc/self/cgroup' line: 9:memory:/user.slice
INFO SystemInfo - || '/proc/self/cgroup' line: 8:cpuset:/
INFO SystemInfo - || '/proc/self/cgroup' line: 7:cpu,cpuacct:/user.slice
INFO SystemInfo - || '/proc/self/cgroup' line: 6:net_cls,net_prio:/
INFO SystemInfo - || '/proc/self/cgroup' line: 5:freezer:/
INFO SystemInfo - || '/proc/self/cgroup' line: 4:perf_event:/
INFO SystemInfo - || '/proc/self/cgroup' line: 3:devices:/user.slice
INFO SystemInfo - || '/proc/self/cgroup' line: 2:hugetlb:/
INFO SystemInfo - || '/proc/self/cgroup' line: 1:name=systemd:/user.slice/user-1000.slice/session-c2.scope
INFO StartupInfo - Starting Elastic APM 1.4.0 as front-end on Java 1.7.0_141 (Azul Systems, Inc.) Linux 4.4.0-142-generic
Scenario
- 2 grails@2.2.4 apps (simple java servlet for all practical effects): front-end and jobs
- they talk to each other using a custom rest/client lib that uses Apache HTTP async
Custom Apache HTTP Async
- I had to implement custom span creation and trace propagation (async isn't supported yet):
final Span spanForHttpCall = ElasticApm.currentSpan().startSpan("ext", "", "")
spanForHttpCall.activate()
try {
/*... builds an async request using Apache HTTP, and runs it on an executorService, effectivelly making it sync/blocking ...*/
return result
} catch (final Throwable errorFromTask) {
spanForHttpCall.captureException(errorFromTask)
throw errorFromTask
} finally {
spanForHttpCall.end()
}
- inside the creation/setup of the request, I do the following:
final Span span = ElasticApm.currentSpan()
span.setName("method+path, basically")
/*...*/
final HeaderInjector headerInjector = { final String headerName, final String headerValue ->
req.addHeader(headerName, headerValue)
} as HeaderInjector
span.injectTraceHeaders(headerInjector)
Custom transaction name
- I also had to create a servlet filter to make the transaction properly named
Current apparent situation
Everything seems to be working if I don't check logs:
- apm-server/elastic/kibana are fine
This is an example of "fine" transaction/trace:
Problem
The agent is spitting warnings very often (~0.5-1.0 warnings per request), telling me that:
WARN co.elastic.apm.agent.impl.ElasticApmTracer - Deactivating a span (...) which is not the currently active span (...). This can happen when not properly deactivating a previous span.
I setup a few prints and I got the following, which seems like a pattern:
/*This was a request #1 I did*/
[t8] BEFORE span creation+activation
currentSpan=SpanImpl@3afd1618::<SpanImpl@4e4c4550 span='SimpleGrailsController' 00-906...-aac...-01>
[t8] AFTER span creation+activation
currentSpan=SpanImpl@4bff69db::<SpanImpl@35b23fda span='' 00-906...-3de...-01>
spanForHttpCall=SpanImpl@497e1cc0::<SpanImpl@497e1cc0 span='' 00-906...-3de...-01>
[t8] BEFORE span.end()
currentSpan=SpanImpl@14b1c74a::<SpanImpl@5c15b359 span='Tarefa[ GET :: tarefa/summary ]' 00-906...-3de...-01>
[t8] AFTER span.end()
currentSpan=SpanImpl@6f51fdcc::<SpanImpl@4220ad4e span='Tarefa[ GET :: tarefa/summary ]' 00-906...-3de...-01>
/*This was a request #2 I did, ~6 seconds after request #1*/
[t3] BEFORE span creation+activation
currentSpan=SpanImpl@15fb8e5d::<SpanImpl@5f795213 span='SimpleGrailsController' 00-774...-e43...-01>
[t3] AFTER span creation+activation
currentSpan=SpanImpl@5af95eef::<SpanImpl@363f86f1 span='' 00-774...-f93...-01>
spanForHttpCall=SpanImpl@4b4540a::<SpanImpl@4b4540a span='' 00-774...-f93...-01>
[t3] BEFORE span.end()
currentSpan=SpanImpl@c117380::<SpanImpl@6ac09f29 span='Tarefa[ GET :: tarefa/summary ]' 00-774...-f93...-01>
[t3] AFTER span.end()
currentSpan=SpanImpl@5f2f1cab::<SpanImpl@4db5deb0 span='Tarefa[ GET :: tarefa/summary ]' 00-774...-f93...-01>
2019-03-14 19:18:03.151 [t3] WARN co.elastic.apm.agent.impl.ElasticApmTracer - Deactivating a span (00-0*32-aac...-00) which is not the currently active span ('' 00-0*32-e43...-00). This can happen when not properly deactivating a previous span.
Interesting detail and a question: why does the ElasticApm.currentSpan()
returns a different instance for every invocation, even though it's still the same?
Note that, the warning mentioned trying to deactivate/end a span aac...
(which was the parent span for the first request) when the active span was instead e43...
(which was the parent span for the second request).
This seems like a delayed deactivation, that was even able to happen between different threads.
The warning/issue only happens if I have the custom span for Apache HTTP async.
The warning/issue also happens if I create the custom span with ElasticApm.currentTransaction().startSpan
instead of ElasticApm.currentSpan().startSpan
.
The effect of this delayed deactivation is the following::
Request number 1:
Request number 2:
Note that, the first transaction doesn't know about it's render span at all.
I could always disable instrumentation and create the transaction myself: I think I would be able to setup all spans I want properly. However, I'd like to use as much automatic instrumentation as possible, and I'm trying to make the supported technologies and my frameworks coexist well.
I could provide more information about a scenario like that (like the raw docs), or more testing, if needed.
I have to say that I'm still amazed at how much quality and benefit I was able to achieve with such little effort. You guys are all amazing, and I'm glad I'm having the chance to work with Elastic APM.
Also, I had to remove a lot of stuff from the original thread description, as I found that there's a limit of 7k chars for it (logs and code were simplified a lot, but it should still be readable). Let me know if anything seems out of place, I can provide the full version for each section.