Elastic Java APM Agent Fails with 400 Error Due to Traceparent Header in Kafka Messages

APM Agent language and version:
Java 1.51.0
Original install method (e.g. download page, yum, deb, from source, etc.) and version:
docker image

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):
We have encountered an issue with the Elastic Java APM agent when dealing with trace context propagation in Kafka messages. It seems the tracep arent header for Kafka messages or their headers are not properly encoded when the APM agent tries to extract or inject this header, it leads to errors and warnings in the logs.

Provide logs and/or server output (if relevant):
Below are some example error logs that we observed in our application's logs:

2024-08-08 09:20:35,994 [elastic-apm-server-reporter] WARN  co.elastic.apm.agent.report.AbstractIntakeApiHandler - Response body: {
  "accepted": 12,
  "errors": [
    {
      "message": "decode error: data read error: v2.transactionRoot.Transaction: v2.transaction.Context: v2.context.Message: v2.contextMessage.Age: v2.contextMessageAge.Headers: invalid input for HTTPHeader: \u003cnil\u003e",
      "document": "{\"transaction\":{\"timestamp\":1723134029527000,\"name\":\"Kafka record from t_rp_prod_dogfood_resp\",\"id\":\"1395c37fe49a8e24\",\"trace_id\":\"042e9f8c9645fd42d0e52add559d5f6b\",\"type\":\"messaging\",\"duration\":0.085,\"outcome\":\"success\",\"context\":{\"service\":{\"framework\":{\"name\":\"Kafka\"},\"version\":null},\"message\":{\"headers\":{\"x-conversion-id\":null},\"age\":{\"ms\":79},\"queue\":{\"name\":\"t_rp_prod_dogfood_resp\"}},\"tags\":{}},\"span_count\":{\"dropped\":0,\"started\":0},\"dropped_spans_stats\":[],\"sample_rate\":1.0,\"sampled\":true}}"
    }
  ]
}

2024-08-08 09:20:35,994 [elastic-apm-server-reporter] INFO  co.elastic.apm.agent.report.AbstractIntakeApiHandler - Backing off for 0 seconds (+/-10%)
2024-08-08 09:20:35,994 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.AbstractIntakeApiHandler - Error sending data to APM server: Server returned HTTP response code: 400 for URL: https://oitces-apm.us-west-2.tencent-ces.com/intake/v2/events, response code is 400

This issue causes the APM reporting to fail with 400 Bad Request errors, likely due to the incorrect or unexpected format of the headers, and the trace id can't be found in the Kibana apm trace UI page. How to solve this issue ? Thanks in advance!

Hi,

Can you provide the logs on apm-server ? From memory they might contain more details about the error.

Also, if you reproduce it with log_level=TRACE the agent should provide the JSON sent to server (the logs will be verbose, but this should be close to the error you reported).

From what we see here it appears the we get the <nil> encoded with unicode.

On Kafka side, can you inspect the transmitted header values ? For example is the value already encoded in unicode or can you read it as <nil> ? I'm wondering if there could be a double-encoding involved if the value is already encoded at kafka level.