Use javaagent happened deadlock

We have one problem when we use javaagent with Tomcat. It will caused deadlock and make some thread blocked , but it seems like a occasional problem,because it happened just one time.

JDK Version:1.8.0_192-b12
Tomcat Version: Apache Tomcat/8.5.38
Linux Version:4.19.113-300.el7.x86_64
es-apm-agent Version: 1.28.5

Here is the thread dump info:

Java stack information for the threads listed above:
===================================================
"elastic-apm-remote-config-poller":
	at sun.net.www.protocol.file.Handler.openConnection(Handler.java:72)
	- waiting to lock <0x000000074a025ae8> (a sun.net.www.protocol.file.Handler)
	at java.net.URL.openConnection(URL.java:979)
	at sun.net.www.protocol.jar.JarURLConnection.<init>(JarURLConnection.java:84)
	at sun.net.www.protocol.jar.Handler.openConnection(Handler.java:41)
	at java.net.URL.openConnection(URL.java:979)
	at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:238)
	at co.elastic.apm.agent.premain.ShadedClassLoader.getClassBytes(ShadedClassLoader.java:110)
	at co.elastic.apm.agent.premain.ShadedClassLoader.findClass(ShadedClassLoader.java:73)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	- locked <0x0000000731e2be40> (a java.lang.Object)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at co.elastic.apm.agent.configuration.ApmServerConfigurationSource.fetchConfig(ApmServerConfigurationSource.java:155)
	at co.elastic.apm.agent.configuration.ApmServerConfigurationSource.pollConfig(ApmServerConfigurationSource.java:122)
	at co.elastic.apm.agent.configuration.ApmServerConfigurationSource.access$000(ApmServerConfigurationSource.java:46)
	at co.elastic.apm.agent.configuration.ApmServerConfigurationSource$1.run(ApmServerConfigurationSource.java:108)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
"main":
	at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:80)
	- waiting to lock <0x000000074a027ac8> (a sun.net.www.protocol.jar.JarFileFactory)
	at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
	at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:152)
	at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:239)
	at net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:453)
	at net.bytebuddy.dynamic.ClassFileLocator$ForClassLoader.locate(ClassFileLocator.java:434)
	at net.bytebuddy.dynamic.ClassFileLocator$Compound.locate(ClassFileLocator.java:1891)
	at net.bytebuddy.pool.TypePool$Default.doDescribe(TypePool.java:839)
	at net.bytebuddy.pool.TypePool$Default$WithLazyResolution.access$001(TypePool.java:921)
	at net.bytebuddy.pool.TypePool$Default$WithLazyResolution.doResolve(TypePool.java:1019)
	at net.bytebuddy.pool.TypePool$Default$WithLazyResolution$LazyTypeDescription.delegate(TypePool.java:1088)
	at net.bytebuddy.description.type.TypeDescription$AbstractBase$OfSimpleType$WithDelegation.getSuperClass(TypeDescription.java:8349)
	at net.bytebuddy.description.type.TypeDescription$Generic$OfNonGenericType.getSuperClass(TypeDescription.java:3600)
	at net.bytebuddy.description.type.TypeDefinition$SuperClassIterator.next(TypeDefinition.java:391)
	at net.bytebuddy.description.type.TypeDefinition$SuperClassIterator.next(TypeDefinition.java:357)
	at net.bytebuddy.matcher.HasSuperTypeMatcher.doMatch(HasSuperTypeMatcher.java:53)
	at net.bytebuddy.matcher.HasSuperTypeMatcher.doMatch(HasSuperTypeMatcher.java:31)
	at net.bytebuddy.matcher.ElementMatcher$Junction$ForNonNullValues.matches(ElementMatcher.java:249)
	at net.bytebuddy.matcher.ElementMatcher$Junction$Conjunction.matches(ElementMatcher.java:146)
	at co.elastic.apm.agent.bci.ElasticApmAgent$6.matches(ElasticApmAgent.java:380)
	at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.doTransform(AgentBuilder.java:11740)
	at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.transform(AgentBuilder.java:11698)
	at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.access$1700(AgentBuilder.java:11415)
	at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer$LegacyVmDispatcher.run(AgentBuilder.java:12098)
	at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer$LegacyVmDispatcher.run(AgentBuilder.java:12038)
	at java.security.AccessController.doPrivileged(Native Method)
	at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.doPrivileged(AgentBuilder.java)
	at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.transform(AgentBuilder.java:11607)
	at sun.instrument.TransformerManager.transform(TransformerManager.java:188)
	at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:428)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at java.net.URL.getURLStreamHandler(URL.java:1175)
	at java.net.URL.<init>(URL.java:420)
	at java.net.URL.<init>(URL.java:310)
	at java.net.URL.<init>(URL.java:333)
	at sun.net.www.protocol.file.Handler.openConnection(Handler.java:92)
	- locked <0x000000074a025ae8> (a sun.net.www.protocol.file.Handler)
	at sun.net.www.protocol.file.Handler.openConnection(Handler.java:72)
	- locked <0x000000074a025ae8> (a sun.net.www.protocol.file.Handler)
	at java.net.URL.openConnection(URL.java:979)
	at sun.net.www.protocol.jar.JarURLConnection.<init>(JarURLConnection.java:84)
	at sun.net.www.protocol.jar.Handler.openConnection(Handler.java:41)
	at java.net.URL.openConnection(URL.java:979)
	at org.apache.tomcat.util.compat.JreCompat.disableCachingForJarUrlConnections(JreCompat.java:152)
	at org.apache.catalina.core.JreMemoryLeakPreventionListener.lifecycleEvent(JreMemoryLeakPreventionListener.java:410)
	at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:94)
	at org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:395)
	- locked <0x0000000747bde480> (a org.apache.catalina.core.StandardServer)
	at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:106)
	- locked <0x0000000747bde480> (a org.apache.catalina.core.StandardServer)
	at org.apache.catalina.startup.Catalina.load(Catalina.java:639)
	at org.apache.catalina.startup.Catalina.load(Catalina.java:662)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:309)
	at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:492)
"elastic-apm-tracer-initializer":
	at sun.net.www.protocol.file.Handler.openConnection(Handler.java:72)
	- waiting to lock <0x000000074a025ae8> (a sun.net.www.protocol.file.Handler)
	at java.net.URL.openConnection(URL.java:979)
	at sun.net.www.protocol.jar.JarFileFactory.getConnection(JarFileFactory.java:65)
	at sun.net.www.protocol.jar.JarFileFactory.getPermission(JarFileFactory.java:154)
	at sun.net.www.protocol.jar.JarFileFactory.getCachedJarFile(JarFileFactory.java:126)
	at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:81)
	- locked <0x000000074a027ac8> (a sun.net.www.protocol.jar.JarFileFactory)
	at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
	at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:152)
	at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:239)
	at co.elastic.apm.agent.premain.ShadedClassLoader.getClassBytes(ShadedClassLoader.java:110)
	at co.elastic.apm.agent.premain.ShadedClassLoader.findClass(ShadedClassLoader.java:73)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	- locked <0x0000000731ac07a0> (a java.lang.Object)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at co.elastic.apm.agent.metrics.builtin.JvmMemoryMetrics.bindTo(JvmMemoryMetrics.java:63)
	at co.elastic.apm.agent.metrics.builtin.JvmMemoryMetrics.start(JvmMemoryMetrics.java:40)
	at co.elastic.apm.agent.impl.ElasticApmTracer.startSync(ElasticApmTracer.java:582)
	- locked <0x000000074a48de88> (a co.elastic.apm.agent.impl.ElasticApmTracer)
	at co.elastic.apm.agent.impl.ElasticApmTracer.access$300(ElasticApmTracer.java:72)
	at co.elastic.apm.agent.impl.ElasticApmTracer$4.run(ElasticApmTracer.java:566)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.

In addition to the above, we doesn't have debug log ,and it go well when we reboot the container. Please help ~ Thank you

Hi @fido ,

Thanks for reporting this, we'll have a look at it and evaluate our options to fix it.

I assume you are either using the elastic.apm.start_async or elastic.apm.delay_agent_premain_ms config options, correct?
If this is the case, removing those options or increasing the delay should prevent the error from occuring.

The reason for the deadlock is that when a class is loaded, the agent "intercepts" this loading and checks whether it needs to instrument this class. This also involves checking the supertypes of the class, which involves loading the bytecode of those from JAR files.

If this happens while the current thread already holds a Jar/file loading related lock, this can cause a deadlock as in your case.

It would be great if you could reproduce the issue and provide debug logs. This would show us the class, during which the instrumentation causes the deadlock. We could then maybe add an option to delay the instrumentation of this class.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.