java.lang.NullPointerException
at co.elastic.apm.agent.impl.transaction.Span.afterEnd(Span.java:289)
at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:601)
at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:580)
at co.elastic.apm.agent.httpclient.ApacheHttpClientInstrumentation$ApacheHttpClientAdvice.onAfterExecute(ApacheHttpClientInstrumentation.java:99)
It seems your fix solved the issue. I deployed the snapshot yesterday afternoon and the error haven't happened since but I've got an another one:
java.lang.NullPointerException
at co.elastic.apm.agent.impl.transaction.Span.tryToCompressRegular(Span.java:369)
at co.elastic.apm.agent.impl.transaction.Span.tryToCompress(Span.java:317)
at co.elastic.apm.agent.impl.transaction.Span.afterEnd(Span.java:286)
at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:601)
at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:580)
at co.elastic.apm.agent.httpclient.ApacheHttpClientInstrumentation$ApacheHttpClientAdvice.onAfterExecute(ApacheHttpClientInstrumentation.java:99)
Thanks for checking the snapshot and reporting this other NPE, I'll take a look.
Even if it is not triggered that often, it still means something is off here.
I've deployed your snapshot but unfortunately the exception is still thrown:
java.lang.NullPointerException
at co.elastic.apm.agent.impl.transaction.Span.afterEnd(Span.java:289)
at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:600)
at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:579)
at co.elastic.apm.agent.httpclient.ApacheHttpClientInstrumentation$ApacheHttpClientAdvice.onAfterExecute(ApacheHttpClientInstrumentation.java:99)
OK, thanks for reporting.
I actually recognized there was still a race condition in the former implementation, but didn't get to that so far.
I now changed this part of the code to be completely thread-safe, so please try out this build. If there is still an issue, it means I didn't identify the exact issue and will have to look elsewhere.
Looking forward to get your feedback.
@Eyal_Koren Yes, I've deployed the latest snapshot on Tuesday last week. I waited a few days to see what happen. Unfortunately NPE has been thrown twice (on Saturday and today) since Tuesday.
java.lang.NullPointerException
at co.elastic.apm.agent.impl.transaction.Span.afterEnd(Span.java:289)
at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:604)
at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:579)
at co.elastic.apm.agent.httpclient.ApacheHttpClientInstrumentation$ApacheHttpClientAdvice.onAfterExecute(ApacheHttpClientInstrumentation.java:99)
@barath I had a chance to look at this again this morning.
What seems weird is that you get this stack trace consistently.
If the latest snapshot is used, this means that the Span#parent field is null.
While I cannot exclude the possibility that some race condition causes this, it should have the same chances of occurring in prior lines, like Span.java:288, Span.java:268 or Span.java:266.
If you always get it at Span.java:289, I would ask you to verify that this is indeed coming from a JVM that indeed uses the latest snapshot, and not some other snapshot/version.
Please describe your environment- do you have only one JVM? If you have multiple JVMs, are they logging to the same place? Is there a possibility that this error doesn't come from an agent of the latest snapshot? If not, did you see similar stack traces, but where the top frame in the stack trace is from a different line?
@Eyal_Koren We have 3 Wildfly instances (Java 11.0.5) running in docker containers on three different hosts. They aren't logging to the same place so stack traces aren't mixed up for sure. The top frame is always the same.
Elastic APM agent version is 1.34.2-SNAPSHOT.5079263. Can you confirm this is the latest one?
Would it be possible for you to provide us with a full INFO-level agent log from the startup until the error occurs in the logs? This might help our investigations.
Thanks for reporting back! We are confident that we solved the issue. We also added a test which was able to reproduce your specific exception when run on a build without the fix.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.