Hi!
I have APM Server instance, it works fine, getting metrics from some java and python agents, showing perfect graphs.
Now I'm trying to add agent to another tomcat application and here is problem.
Agent successfully sends several events to APM Server after tomcat start, but later it drops connections to intake/v2/events in the middle of http request, so server can not get events.
Here is access logs from http balancer (nginx) in front of apm server.
10.99.1.9 - agent's machine, rt
is request time, ut
- upstream response time, ua
- upstream address, ul
- upstream response length
First events request goes well, then almost all requests fails:
10.99.1.9 - - [22/Jun/2020:19:15:34 +0300] "POST /config/v1/agents HTTP/1.1" 200 59 "-" "elasticapm-java/1.17.0" "-" rt=0.039 ua="127.0.0.1:8200" us="200" ut="0.036" ul="59" cs=-
10.99.1.9 - - [22/Jun/2020:19:15:44 +0300] "POST /intake/v2/events HTTP/1.1" 202 0 "-" "elasticapm-java/1.17.0" "-" rt=10.223 ua="127.0.0.1:8200" us="202" ut="0.216" ul="0" cs=-
10.99.1.9 - - [22/Jun/2020:19:15:45 +0300] "POST /intake/v2/events HTTP/1.1" 400 0 "-" "elasticapm-java/1.17.0" "-" rt=0.476 ua="-" us="-" ut="-" ul="-" cs=-
10.99.1.9 - - [22/Jun/2020:19:15:46 +0300] "POST /intake/v2/events HTTP/1.1" 400 0 "-" "elasticapm-java/1.17.0" "-" rt=0.883 ua="-" us="-" ut="-" ul="-" cs=-
10.99.1.9 - - [22/Jun/2020:19:15:50 +0300] "POST /intake/v2/events HTTP/1.1" 400 0 "-" "elasticapm-java/1.17.0" "-" rt=1.556 ua="-" us="-" ut="-" ul="-" cs=-
10.99.1.9 - - [22/Jun/2020:19:15:52 +0300] "POST /intake/v2/events HTTP/1.1" 400 0 "-" "elasticapm-java/1.17.0" "-" rt=1.423 ua="-" us="-" ut="-" ul="-" cs=-
10.99.1.9 - - [22/Jun/2020:19:15:55 +0300] "POST /intake/v2/events HTTP/1.1" 400 0 "-" "elasticapm-java/1.17.0" "-" rt=1.761 ua="-" us="-" ut="-" ul="-" cs=-
Here is cause of error 400 in balancer log:
2020/06/22 19:15:52 [info] 34546#34546: *760 recv() failed (104: Connection reset by peer), client: 10.99.1.9, server: sv-apm01, request: "POST /intake/v2/events HTTP/1.1", host: "eapm-api.lavr.me"
2020/06/22 19:15:55 [info] 34546#34546: *761 recv() failed (104: Connection reset by peer), client: 10.99.1.9, server: sv-apm01, request: "POST /intake/v2/events HTTP/1.1", host: "eapm-api.lavr.me"
It is not timeout issue.
As balancer log shows, agent successfully posts first events in 10 seconds (actually in 0.216sec), but later it breaks connection after 0.476 - 1.7 seconds.
I've checked https://www.elastic.co/guide/en/apm//server/current/common-problems.html#io-timeout - tried to increase all timeouts, no success.
It's not request size problem, described in https://www.elastic.co/guide/en/apm//server/current/common-problems.html#event-too-large
As balancer log shows, it receives about 7Kb from agent before agent resets connection.
Tried send events without balancer, directly to port 8200 - same error in agent logs, no events in apm server. So, it is not balancer issue. And I'm quite sure it is not apm server issue.
In agent log:
2020-06-22 18:01:10,173 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error sending data to APM server: Error writing request body to server, response code is -1
2020-06-22 18:01:10,173 [elastic-apm-server-reporter] WARN co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - null
2020-06-22 18:01:10,216 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Failed to handle event of type SPAN with this error: Unable to write to target stream.
2020-06-22 18:01:10,217 [elastic-apm-server-reporter] INFO co.elastic.apm.agent.shaded.lmax.disruptor.IgnoreExceptionHandler - Exception processing: 4375 co.elastic.apm.agent.report.ReportingEvent@7fefd1e0
co.elastic.apm.agent.shaded.dslplatform.json.SerializationException: Unable to write to target stream.
at co.elastic.apm.agent.shaded.dslplatform.json.JsonWriter.flush(JsonWriter.java:581) ~[?:?]
at co.elastic.apm.agent.report.serialize.DslJsonSerializer.flush(DslJsonSerializer.java:139) ~[?:?]
at co.elastic.apm.agent.report.AbstractIntakeApiHandler.endRequest(AbstractIntakeApiHandler.java:122) ~[?:?]
at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler.endRequest(IntakeV2ReportingEventHandler.java:162) ~[?:?]
at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler.handleEvent(IntakeV2ReportingEventHandler.java:101) ~[?:?]
at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler.onEvent(IntakeV2ReportingEventHandler.java:73) ~[?:?]
at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler.onEvent(IntakeV2ReportingEventHandler.java:44) ~[?:?]
at co.elastic.apm.agent.shaded.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168) [?:?]
at co.elastic.apm.agent.shaded.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125) [?:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.io.IOException: Error writing request body to server
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3536) ~[?:1.8.0_151]
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3519) ~[?:1.8.0_151]
at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253) ~[?:1.8.0_151]
at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211) ~[?:1.8.0_151]
at co.elastic.apm.agent.shaded.dslplatform.json.JsonWriter.flush(JsonWriter.java:579) ~[?:?]
... 9 more
Tomcat startup config:
/opt/jre1.8.0_151/bin/java ... -javaagent:/usr/share/tomcat8/lib/elastic-apm-agent-1.17.0.jar -Delastic.apm.config_file=/srv/sp/conf/elasticapm.properties ... start
Content of /srv/sp/conf/elasticapm.properties:
log_level=INFO
environment=production
server_timeout=5s
api_request_time=10s
api_request_size=1mb
metrics_interval=5s
server_urls=http://sv-apm01.lavr.me/
service_name=sp
log_file=/srv/sp/logs/elasticapm.log
Versions:
Apache Tomcat/8.0.32
Ubuntu 16.04
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
What can it be?
Appreciate any help.