Kibana version: 7.16.1
Elasticsearch version: 7.16.1
APM Server version: 7.16.1
APM Agent language and version: 1.30.0
Browser version:
Original install method (e.g. download page, yum, deb, from source, etc.) and version:
Fresh install or upgraded from other version?
Is there anything special in your setup? For example, are you using the Logstash or Kafka outputs? Are you using a load balancer in front of the APM Servers? Have you changed index pattern, generated custom templates, changed agent configuration etc.
Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):
Steps to reproduce:
The parent service is still using Java agent 1.25.0. The child service is upgraded from 1.25.0 to use 1.30.0 and we notice it rarely collect any sample spans now on child service.
Reverting back to 1.25.0 fix the issue.
Errors in browser console (if relevant):
Provide logs and/or server output (if relevant):
The following error starts showing up after upgrade child service agent to version 1.30.0
2022-04-13 22:38:14,585 [elastic-apm-server-reporter] WARN co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - {
"accepted": 104,
"errors": [
{
"message": "decode error: data read error: v2.transactionRoot.Transaction: v2.transaction.DroppedSpanStats: []v2.transactionDroppedSpanStats: decode slice: expect ],"
In the document I also noticed that dropped_spans_stats
array doesn't have comma ,
between each element assuming make the decode failed?
"dropped_spans_stats": [
{
"destination_service_resource": "sqs.us-east-1.amazonaws.com:443",
"outcome": "success",
"duration": {
"count": 2,
"sum": {
"us": 27780
}
}
}
{
"destination_service_resource": "example.com:443",
"outcome": "success",
"duration": {
"count": 4,
"sum": {
"us": 385481
}
}
}
{
"destination_service_resource": "mysql",
"outcome": "success",
"duration": {
"count": 872,
"sum": {
"us": 5669348
}
}
}
{
"destination_service_resource": "example2.com:443",
"outcome": "success",
"duration": {
"count": 1,
"sum": {
"us": 962256
}
}
}
]
Is this the problem that caused the span trace sample not being collected on child service? Does it has anything to do with a lower APM agent version 1.25.0 on parent service or it's the issue on 1.30.0 agent itself?
The end point we're seeing problem has lots of span that always exceed max span limit. We're not seeing problem for the end point that doesn't have huge spans.
Also, reverting child service agent back to version 1.25.0 seems to fix the issue and I can see span trace sample collected properly again.