Elastic APM (Java Agent) getting 400 error

Kibana version: 7.12.1

Elasticsearch version: 7.12.1

APM Server version: 7.12.1

APM Agent language and version: JAVA elastic-apm-agent-1.23.0.jar

Browser version:

Original install method (e.g. download page, yum, deb, from source, etc.) and version: source

Fresh install or upgraded from other version?: upgraded from 7.7.1

Is there anything special in your setup? For example, are you using the Logstash or Kafka outputs? Are you using a load balancer in front of the APM Servers? Have you changed index pattern, generated custom templates, changed agent configuration etc.: Tried both Elastic and Logstash as outputs. Index pattern is apm-all-*

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

Steps to reproduce:

  1. Download Elastic 7.12.1 from source.
  2. Unzip and configure
  3. Run Elastic and Kibana
  4. Restore all snapshot from 7.7.1
  5. Run Logstash and APM Server
  6. Run Tomcat 8 pointing to APM Server

Elastic APM Server error logs

{"log.level":"info","@timestamp":"2021-05-09T22:40:59.140-0400","log.logger":"monitoring","log.origin":{"file.name":"elasticsearch/elasticsearch.go","file.line":272},"message":"Start monitoring stats metrics snapshot loop with period 10s.","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:40:59.140-0400","log.logger":"monitoring","log.origin":{"file.name":"elasticsearch/elasticsearch.go","file.line":272},"message":"Start monitoring state metrics snapshot loop with period 1m0s.","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.031-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":143},"message":"Connecting to backoff(async(tcp://logstash-apm-1-p2:5044))","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.031-0400","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.031-0400","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":"  done","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.032-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":143},"message":"Connecting to backoff(async(tcp://logstash-apm-2-p2:5044))","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.032-0400","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.032-0400","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":"  done","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.032-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":143},"message":"Connecting to backoff(async(tcp://logstash-apm-dc2:5044))","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.032-0400","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.032-0400","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":"  done","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.032-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":151},"message":"Connection to backoff(async(tcp://logstash-apm-1-p2:5044)) established","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.035-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":151},"message":"Connection to backoff(async(tcp://logstash-apm-2-p2:5044)) established","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:00.035-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":151},"message":"Connection to backoff(async(tcp://logstash-apm-dc2:5044)) established","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:09.146-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":143},"message":"Connecting to backoff(monitoring(https://elasticsearch-master-1-p2:9200))","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:09.146-0400","log.logger":"monitoring","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:09.146-0400","log.logger":"monitoring","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":"  done","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:09.147-0400","log.logger":"esclientleg","log.origin":{"file.name":"eslegclient/connection.go","file.line":314},"message":"Attempting to connect to Elasticsearch version 7.12.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-05-09T22:41:09.173-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":151},"message":"Connection to backoff(monitoring(https://elasticsearch-master-1-p2:9200)) established","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-05-09T22:41:51.454-0400","log.logger":"request","log.origin":{"file.name":"middleware/log_middleware.go","file.line":60},"message":"data validation error","url.original":"/intake/v2/events","http.request.method":"POST","user_agent.original":"elasticapm-java/1.23.0","source.address":"138.12.80.27","http.request.id":"e2b91678-2f1e-4b01-8ec3-a6974dd13dc2","event.duration":10001633770,"http.response.status_code":400,"error.message":"decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614473187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614473187000,\"samples\":{\"system.process.cpu.total.norm|..., decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614473187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614473187000,\"tags\":{\"name\":\"Par Survivor Space\"},\"sam|..., decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614473187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614473187000,\"tags\":{\"name\":\"ParNew\"},\"samples\":{\"jvm.|..., decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614473187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614473187000,\"tags\":{\"name\":\"CMS Old Gen\"},\"samples\":{|..., decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614473187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614473187000,\"tags\":{\"name\":\"ConcurrentMarkSweep\"},\"sa|...","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-05-09T22:42:38.490-0400","log.logger":"request","log.origin":{"file.name":"middleware/log_middleware.go","file.line":60},"message":"data validation error","url.original":"/intake/v2/events","http.request.method":"POST","user_agent.original":"elasticapm-java/1.23.0","source.address":"138.12.80.27","http.request.id":"eb431d7c-a1e4-4d13-942e-f99b7c8673cd","event.duration":10008827523,"http.response.status_code":400,"error.message":"decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614518187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614518187000,\"samples\":{\"system.process.cpu.total.norm|..., decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614518187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614518187000,\"tags\":{\"name\":\"Par Survivor Space\"},\"sam|..., decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614518187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614518187000,\"tags\":{\"name\":\"ParNew\"},\"samples\":{\"jvm.|..., decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614518187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614518187000,\"tags\":{\"name\":\"CMS Old Gen\"},\"samples\":{|..., decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620614518187000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620614518187000,\"tags\":{\"name\":\"ConcurrentMarkSweep\"},\"sa|...","ecs.version":"1.6.0"}

Java Agent Screenshot

Elastic APM Server Screenshot

EDITED: Add more infor
Tomcat 8 Java Parameters

-javaagent:/nfs/tomcat/elasticsearch/elastic-apm-agent-1.23.0.jar \
       -Delastic.apm.service_name=search_app \
       -Delastic.apm.environment=dev \
       -Delastic.apm.application_packages=* \
       -Delastic.apm.active=true \
       -Delastic.apm.central_config=false \
       -Delastic.apm.instrument=true \
       -Delastic.apm.metrics_interval=15s \
       -Delastic.apm.server_urls=http://elastic-apm-dc2:80

apm-server.yml

apm-server:
  host: "${HOSTNAME}:8200"
  idle_timeout: 45s
  read_timeout: 60s
  write_timeout: 30s
  shutdown_timeout: 5s
  max_event_size: 307200
  max_connections: 0
  rum:
    enabled: true
    event_rate:
      limit: 1000
      lru_size: 5000
    allow_origins : ['*']
    source_mapping.enabled: true
    source_mapping.cache.expiration: 5m
    source_mapping.index_pattern: "apm-all-*"
  ilm.enabled: "true"
  kibana:
    enabled: true
    host: "kibana-dc2:80"
    protocol: "http"
    username: "${KB_USR}"
    password: "${KB_PWD}"
queue:
  mem:
    events: 32000
    flush.min_events: 1000
    flush.timeout: 1s
setup.template.enabled: true
setup.template.name: "apm-all"
setup.template.pattern: "apm-all-*"
setup.template.fields: "${path.config}/fields.yml"
setup.template.settings:
  index:
    number_of_shards: 2
    codec: best_compression
    number_of_routing_shards: 30
    mapping.total_fields.limit: 2000
output.logstash:
  enabled: true
  hosts: ["logstash-apm-dc2:5044"]
  #worker: 10
  loadbalance: true
  pipelining: 2
  slow_start: false
  backoff.init: 1s
  backoff.max: 60s
  index: "apm-all"
logging.level: debug
logging.to_files: true
logging.files:
  name: apm-server.log
  rotateeverybytes: 10485760 # = 10MB
  keepfiles: 7
  permissions: 0600
  interval: 24h
http.enabled: true
http.port: 5066
monitoring.enabled: false
monitoring.cluster_uuid: "AN5TQRSZBPWaVYTBrKfy-A"

Tomcat 8 catalina.out log

2021-05-10 05:18:45,422 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error sending data to APM server: Server returned HTTP response code: 400 for URL: http://elastic-apm-dc2:80/intake/v2/events, response code is 400
2021-05-10 05:18:45,423 [elastic-apm-server-reporter] WARN  co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - {
  "accepted": 0,
  "errors": [
    {
      "message": "decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620638244507000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620638244507000,\"samples\":{\"system.process.cpu.total.norm|...",
      "document": "{\"metricset\":{\"timestamp\":1620638244507000,\"samples\":{\"system.process.cpu.total.norm.pct\":{\"value\":0.002345058626465662},\"jvm.memory.heap.used\":{\"value\":73905032.0},\"jvm.memory.non_heap.used\":{\"value\":71294696.0},\"jvm.memory.heap.max\":{\"value\":1056309248.0},\"jvm.gc.alloc\":{\"value\":2576456776.0},\"jvm.memory.non_heap.committed\":{\"value\":72810496.0},\"system.cpu.total.norm.pct\":{\"value\":0.03484087102177554},\"system.process.cgroup.memory.mem.usage.bytes\":{\"value\":3388575744.0},\"system.process.memory.size\":{\"value\":3688296448.0},\"jvm.memory.heap.committed\":{\"value\":138665984.0},\"jvm.memory.non_heap.max\":{\"value\":-1.0},\"system.memory.actual.free\":{\"value\":1370861568.0},\"system.memory.total\":{\"value\":3973640192.0},\"jvm.thread.count\":{\"value\":78.0}}}}"
    },
    {
      "message": "decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620638244507000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620638244507000,\"tags\":{\"name\":\"Par Survivor Space\"},\"sam|...",
      "document": "{\"metricset\":{\"timestamp\":1620638244507000,\"tags\":{\"name\":\"Par Survivor Space\"},\"samples\":{\"jvm.memory.heap.pool.committed\":{\"value\":2228224.0},\"jvm.memory.heap.pool.max\":{\"value\":17432576.0},\"jvm.memory.heap.pool.used\":{\"value\":2227120.0}}}}"
    },
    {
      "message": "decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620638244507000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620638244507000,\"tags\":{\"name\":\"ParNew\"},\"samples\":{\"jvm.|...",
      "document": "{\"metricset\":{\"timestamp\":1620638244507000,\"tags\":{\"name\":\"ParNew\"},\"samples\":{\"jvm.gc.time\":{\"value\":2721.0},\"jvm.gc.count\":{\"value\":222.0}}}}"
    },
    {
      "message": "decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620638244507000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620638244507000,\"tags\":{\"name\":\"CMS Old Gen\"},\"samples\":{|...",
      "document": "{\"metricset\":{\"timestamp\":1620638244507000,\"tags\":{\"name\":\"CMS Old Gen\"},\"samples\":{\"jvm.memory.heap.pool.committed\":{\"value\":118546432.0},\"jvm.memory.heap.pool.max\":{\"value\":899284992.0},\"jvm.memory.heap.pool.used\":{\"value\":71507304.0}}}}"
    },
    {
      "message": "decode error: data read error: v2.metricsetRoot.Metricset: v2.metricset.Timestamp: readUint32: overflow, error found in #10 byte of ...|\":1620638244507000,\"|..., bigger context ...|{\"metricset\":{\"timestamp\":1620638244507000,\"tags\":{\"name\":\"ConcurrentMarkSweep\"},\"sa|...",
      "document": "{\"metricset\":{\"timestamp\":1620638244507000,\"tags\":{\"name\":\"ConcurrentMarkSweep\"},\"samples\":{\"jvm.gc.time\":{\"value\":708.0},\"jvm.gc.count\":{\"value\":20.0}}}}"
    }
  ]
}

I'm now using the following versions:

  • Elasticsearch 7.12.1
  • Kibana 7.12.1
  • Logstash 7.12.1
  • Metricbeat 7.12.1
  • APM-Server 7.11.2
    When using APM-Server 7.12.1 resulted the above error.

Upgraded APM-Server from 7.11.2 to 7.12.0 and it is still working. I've upgraded from 7.12.0 to 7.12.1 and got the error. Something on 7.12.1 is causing errors to our APM-Server

Hi @adsandie ,

The error message we see here tends to indicate that it could be a JSON formatting error, but I don't see anything wrong with what I see here.

I tried to reproduce the issue locally with an agent that tries to use the same "timestamp" field value and it seems properly parsed with server 7.12.1.

Only a few things changed between 7.12.0 and 7.12.1 version on the server (see diff on github), but some dependencies have also been changed and I'm not familiar with details on that code base.

What is great here, is that you have already narrowed down the issue to apm-server and two patch versions, which should help us for investigation.

What could be done now is to try to reproduce with the agent with the TRACE log level, just set log_level=trace and capture the log lines that contain the following string (the log will be very verbose) :

TRACE co.elastic.apm.agent.report.serialize.DslJsonSerializer

That should provide us a copy of the raw JSON that is generated by the agent before sending it to the server. Correlation with the ones that produce errors on the server could be done by using the timestamp field value.

@adsandie did you mistakenly install the 32-bit version of apm-server 7.12.1? It looks like there's a bug that would affect the 32-bit version only, but that wasn't introduced in 7.12.1. So I'm just guessing that you were running 64-bit 7.11.2 and 7.12.0, and 32-bit 7.12.1.

1 Like

Sorry for the late response as I was out. I'll check and see if the error was caused by the 32bit.

It appears we installed the 32bit version. We can now migrate the APM-Server to 7.12.1. Thank you for your support.

1 Like

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.