Instrumentation of Jenkins fails, a thread dies unexpectedly due to an uncaught exception

Hi,

I tried to instrument a jenkins instance, but the thread for the elastic apm agent dies.

Kibana version: 7.2

Elasticsearch version: 7.2

APM Server version: 7.2

APM Agent language and version: Java, 1.7

Steps to reproduce:

  1. Install Jenkins
  2. Add the following to JENKINS_JAVA_OPTIONS:
    -javaagent:/opt/elastic-apm-agent-1.7.0.jar -Delastic.apm.disable_instrumentation='' -Delastic.apm.application_packages=hudson,jenkins,org.eclipse -Delastic.apm.trace_methods=hudson.,jenkins.,org.eclipse.* -Delastic.apm.service_name=jenkins -Delastic.apm.server_url=http://apm-server:8200"
  3. Restart Jenkins

Provide logs and/or server output (if relevant):
2019-07-05 12:25:55.744+0000 [id=13] SEVERE h.i.i.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler#uncaughtException: A thread (apm-request-timeout-timer/13) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.lang.IllegalStateException: Ring buffer has no available slots
at co.elastic.apm.agent.report.ApmServerReporter.flush(ApmServerReporter.java:173)
at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler$FlushOnTimeoutTimerTask.run(IntakeV2ReportingEventHandler.java:412)
at java.base/java.util.TimerThread.mainLoop(Timer.java:556)
at java.base/java.util.TimerThread.run(Timer.java:506)
2019-07-05 14:26:21.329 [apm-reporter] INFO co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 0 seconds (+/-10%)
2019-07-05 14:26:21.329 [apm-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error sending data to APM server: Error writing request body to server, response code is -1
2019-07-05 14:26:21.330 [apm-reporter] WARN co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - null
2019-07-05 14:26:21.332 [apm-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Failed to handle event of type SPAN with this error: Timer already cancelled.
2019-07-05 14:26:21.333 [apm-reporter] INFO co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 1 seconds (+/-10%)

Any ideas?

Best regards,
Robert

Hi and thanks for reporting!

Looks like you are right- the Timer behaves as if it was cancelled when the main loop throws an Exception. We will look into that.

Does this reproduces every time?
Was there a proper connection with the APM server prior to that? You can see that in the top of the agent log.

Hi,

This happened once shortly after a restart of the jenkins, today it run fine for several hours, but crashed now.

Shortly before the crash, lines like the following appeared in the log:

2019-07-10 15:10:30.924 [apm-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error sending data to APM server: Server returned HTTP response code: 503 for URL: http://192.168.122.150:8200/intake/v2/events, response code is 503
2019-07-10 15:10:30.924 [apm-reporter] WARN co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - {
"accepted": 890,
"errors": [
{
"message": "queue is full"
}
]
}

Oh no!
Please send the log from this error message and until the end, and any other server log, if there's such, that may contain info about the crash.

For what i saw before I made a pull request. Try using it's product - this snapshot build.

I'll have the new version in use and report back later.

Thanks, though note it is not an official version, it's a snapshot build

So, it seems, that has fixed the issue.

Great! Thanks for the update.
Please let us know if something changes.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.