APM Java agnet return 400 sending report to elastic and create holes - co.elastic.apm.agent.report.IntakeV2ReportingEventHandler

hi all,
testing the java agent on our application we see that with the application under load the agent is not able to send the report to APM server and get HTTP 400 error
this is the specific error:
co.elastic.apm.agent.report.IntakeV2ReportingEventHandler

APM Server version: 7.9.2
APM Agent language and version: 1.18.1 EN
Tomcat version: 8.0.52
Java version: 1.8.161

this is the full trace:

    2020-11-24 14:52:03,756 [elastic-apm-server-reporter] INFO  co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 0 seconds (+/-10%)
    --
    2020-11-24 14:52:03,756 [elastic-apm-server-reporter] WARN  co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - {
      "accepted": 7323,
      "errors": [
        {
          "message": "failed to validate error: error validating JSON: I[#] S[#] doesn't validate with \"error#\"\n  I[#] S[#/allOf/1] allOf failed\n    I[#] S[#/allOf/1/allOf/3] allOf failed\n      I[#] S[#/allOf/1/allOf/3/then] if-then failed\n        I[#] S[#/allOf/1/allOf/3/then/required] missing properties: \"trace_id\"",
          "document": "{\"error\":{\"timestamp\":28297279401,\"transaction\":{\"sampled\":false},\"id\":\"7e07d61a687b28373635ddf414039e9e\",\"parent_id\":\"cd08dc2f11f77e86\",\"context\":{\"tags\":{}},\"culprit\":\"com.mymoney.myfin.platform.backend.service.external.EdsRemoteCardProvider.createCard(EdsRemoteCardProvider.java:121)\",\"exception\":{\"message\":\"Connect to px-dev-11.px.iefcloud.com:8443 [px-dev-11.px.iefcloud.com/172.31.17.162] failed: Connessione rifiutata (Connection refused)\",\"stacktrace\":[{\"filename\":\"DefaultHttpClientConnectionOperator.java\",\"classname\":\"org.apache.http.impl.conn.DefaultHttpClientConnectionOperator\",\"function\":\"connect\",\"library_frame\":true,\"lineno\":151,\"module\":\"org.apache.http.impl.conn\"},{\"filename\":\"PoolingHttpClientConnectionManager.java\",\"classname\":\"org.apache.http.impl.conn.PoolingHttpClientConnectionManager\",\"function\":\"connect\",\"library_frame\":true,\"lineno\":353,\"module\":\"org.apache.http.impl.conn\"},{\"filename\":\"MainClientExec.java\",\"classname\":\"org.apache.http.impl.execchain.MainClientExec\",\"function\":\"establishRoute\",\"library_frame\":true,\"lineno\":380,\"module\":\"org.apache.http.impl.execchain\"},{\"filename\":\"MainClientExec.java\",\"classname\":\"org.apache.http.impl.execchain.MainClientExec\",\"function\":\"execute\",\"library_frame\":true,\"lineno\":236,\"module\":\"org.apache.http.impl.execchain\"},{\"filename\":\"ProtocolExec.java\",\"classname\":\"org.apache.http.impl.execchain.ProtocolExec\",\"function\":\"execute\",\"library_frame\":true,\"lineno\":184,\"module\":\"org.apache.http.impl.execchain\"},{\"filename\":\"RetryExec.java\",\"classname\":\"org.apache.http.impl.execchain.RetryExec\",\"function\":\"execute\",\"library_frame\":true,\"lineno\":88,\"module\":\"org.apache.http.impl.execchain\"},{\"filename\":\"RedirectExec.java\",\"classname\":\"org.apache.http.impl.execchain.RedirectExec\",\"function\":\"execute\",\"library_frame\":true,\"lineno\":110,\"module\":\"org.apache.http.impl.execchain\"},{\"filename\":\"InternalHttpClient.java\",\"classname\":\"org.apache.http.impl.client.InternalHttpClient\",\"function\":\"doExecute\",\"library_frame\":true,\"lineno\":184,\"module\":\"org.apache.http.impl.client\"},{\"filename\":\"CloseableHttpClient.java\",\"classname\":\"org.apache.http.impl.client.CloseableHttpClient\",\"function\":\"execute\",\"library_frame\":true,\"lineno\":82,\"module\":\"org.apache.http.impl.client\"},{\"filename\":\"CloseableHttpClient.java\",\"classname\":\"org.apache.http.impl.client.CloseableHttpClient\",\"function\":\"execute\",\"library_frame\":true,\"lineno\":55,\"module\":\"org.apache.http.impl.client\"},{\"filename\":\"HttpComponentsClientHttpRequest.java\",\"classname\":\"org.springframework.http.client.HttpComponentsClientHttpRequest\",\"function\":\"executeInternal\",\"library_frame\":true,\"lineno\":87,\"module\":\"org.springframework.http.client\"},{\"filename\":\"AbstractBufferingClientHttpRequest.java\",\"classname\":\"org.springframework.http.client.AbstractBufferingClientHttpRequest\",\"function\":\"executeInternal\",\"library_frame\":true,\"lineno\":48,\"module\":\"org.springframework.http.client\"},{\"filename\":\"AbstractClientHttpRequest.java\",\"classname\":\"org.springframework.http.client.AbstractClientHttpRequest\",\"function\":\"execute\",\"library_frame\":true,\"lineno\":53,\"module\":\"org.springframework.http.client\"},work.ReflectiveMethodInvocation\",\"function\":\"proceed\",\"library_frame\":true,\"lineno\":174,\"module\":\"org.springframework.aop.framework\"},{\"filename\":\"MethodSecurityInterceptor.java\",\"classname\":\"org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor\",\"function\":\"invoke\",\"library_frame\":true,\"lineno\":69,\"module\":\"org.springframework.security.access.intercept.aopalliance\"},{\"filename\":\"ReflectiveMethodInvocation.java\",\"classname\":\"org.springframework.aop.framework.ReflectiveMethodInvocation\",\"function\":\"proceed\",\"library_frame\":true,\"lineno\":185,\"module\":\"org.springframework.aop.framework\"}],\"type\":\"java.net.ConnectException\"}]}}}"
        }
      ]
    }

    2020-11-24 14:55:58,703 [commons-pool-EvictionTimer] WARN  co.elastic.apm.agent.impl.ElasticApmTracer - Tried to start a transaction with a non-zero reference count 3 '' 00-00000000000000000000000000000000-0000000000000000-00 (6068d2b7)

sorry the full trace do not enter here, i can provide it later, this is a sample
this is generated a lot of times....
this is the outcome on the APM UI:
apm-buco

apm-buco-02

i also see there are missing methods not shown. Is the tomcat too busy ?

@ldomenella thanks for reporting!

There are several issues reported here that may be related or not.
We will do our best to assist, but it will require some effort on your side in order to get the full info.

  1. failed to validate error - the APM Server rejects Error events that contain transaction_id and/or parent_id but do not contain trace_id. I did a PR to make sure we don't send such error events in the future. However, since this is not expected, I also added some logs to trace where that occurs. Please try this snapshot with log_level set to TRACE and send the related DEBUG log (would start with: Creating an Error as child of...) and the TRACE log with the stack trace that follows.
  2. WARN... Tried to start a transaction with a non-zero reference count 3 ... - this indicates we have some kind of failure with regard to transaction lifecycle management. You could see at the end of this warning the identity hash code of this transaction instance - (6068d2b7). We would like to follow reference/dereference of a transaction in order to figure out why this happens. In order to get this info, please set log_level to DEBUG or TRACE, look for the same warning in the log and collect all log entries that contain the related transaction identity hash code.
  3. Missing spans - please elaborate: what are you expecting and is not there? Do you see it in some cases but not in others? Whenever you get a transaction with suspected missing spans, you can fetch that transaction document from Elasticsearch and look in the span_count field. It may indicate that there are some dropped spans. If this is due to too much load, you should find an indication in the agent log.

In addition, any additional info you can provide would be great, for example:

  • Do you rely on automatic tracing only, or do you produce manual transactions/spans/errors as well?
  • Which technologies are you using within your application?

answering your question:

  1. yes we rely on automatic tracing only
  2. tomcat8.0.52/spring 5.0/java 1.8.161

we set the latest snapshot today and enabled the trace log, will let you know

@ldomenella the logs requested in 1 and 2 will be very useful for analysis. If you could provide them, that would be great.
Thanks.

Reiterating my former request for answers and logs- I know it requires some effort, but may be useful for us in case there is a real issue here.
Thanks :pray:

will provide today, sorry were busy those days. How to supply them ?
im trying to reproduce the issue

we tried to produce some loads but we are not able to reproduce the issue, we are using the latest snapshot: 1.19.1

let me know if you want to close the thread, if we see it happend again, i know what to do.

@ldomenella thanks for the update!
The failed to validate error warning is expected to disappear, as we ensure we are not sending improper errors. However, we are not sure what could have caused this (an error without trace_id. Did you look for a DEBUG message starting with Creating an Error as child of in the logs?
Similarly, we did not do anything to handle the Tried to start a transaction with a non-zero reference count warning...
So, if you find any of these two, please DO report back.
Thanks! :pray:

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.