I noticed there is a detach method in co.elastic.apm.agent.premain.AgentMain which seems to be used for detaching the agent, although the comment says it's only for demonstration purposes.
I tried to use this method to detach my Java agent and found that when the target application is using Java versions below 11, the class loader cannot be unloaded properly. In Java 8, this results in Metaspace constantly expanding, even when full GC occurs.
With multiple alternations of attach-detach, the metaspace looks like this.
My question is whether the elastic-apm-agent has ever validated the detach method here? Does it work properly?
I have written an example that reproduces the problem I described, and my current conclusion is that once the IndyBootstrap#bootstrap method is executed at runtime, the custom class loader cannot be unloaded. However, it's worth noting that if our application is running on Java 11 or higher, class unloading can be completed successfully. Does anyone have any insight into this issue?
Per the comment, it's not even partially complete and so I'm unsurprised to see your results. At the current time we don't have this on our roadmap to put any resources into it. We're happy for others to work on it, there are several suggestions in the comment as to what more needs doing
Hi, @Jack_Shirazi:
I noticed the comment from Elastic on this method, and as far as I understand, we need to do the following:
Separate the agent core as a standalone jar, instead of having it in the agent/ folder within the javaagent.
Make sure that ClassFileTransformer is removed and classes are retransformed.
Stop all threads within the javaagent.
Ensure that we use DetachThreadLocal to replace all ThreadLocals and clean them up.
Close custom class loaders.
In addition, I also set bootstrap to null in InvokeBootstrapDispatcher, but even after doing all of the above, as I mentioned earlier, we still can't unload it in Java 8, 9, and 10 runtime environments. However, if it runs on Java 11 and higher versions, the unloading can be completed perfectly. I've been troubleshooting this issue for days and I'm almost stuck here. To this end, I created a minimal example repository agent-detach-issue to demonstrate and reproduce this issue, and uploaded a heap dump there. If you could spare some time, I hope you could take a look at it. I'm starting to suspect it's a bug in versions before Java 11.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.