Waiting 30s... Wait time is taken from max-age directive in Cache-Control header in APM Server's response. dbgIterationsCount: 37436.
Waiting 30s... Wait time is taken from max-age directive in Cache-Control header in APM Server's response. dbgIterationsCount: 36.
Waiting 30s... Wait time is taken from max-age directive in Cache-Control header in APM Server's response. dbgIterationsCount: 27.
Today, when I checked the logs collected by filebeat, I found that many service logs showed "wait" messages. Why? I did not find a solution in "Troubleshoot Common problems"
It may be, but we can't know for sure without proper debugging. If you could, please run the APM Server with the options -e -d "*" to increase verboseness in the details of the log. Also, check if there is a load balancer in-between the agent and the APM server — sometimes, these timeouts may be caused by unbalanced timeouts.
For debugging, it may be a little troublesome, the apm-server is running on the kubernetes cluster.
The agent reaches the apm-server through the request domain name (nginx forwarding), I checked the nginx log, and there is no exception.
Ah, that's interesting. Check the timeout configured for NGINX and the APM Server. Ideally, your load balancer timeout should be between the agent timeout and the APM Server timeout — and certainly smaller than the APM Server. For example:
APM agent (10s) Load Balancer (15s) APM Server (30s)
According to your service log, this message represents the .NET agent trying to poll data from the APM Server and timing out.
dbgIterationsCount represents the number of attempts to complete the data polling but failed. Each attempt increments the counter and then schedules another attempt. This means that something is going on between the agent and the APM Server at a network level. If I were you; I would start capturing some network packets from this communication to understand further what is happening. As you can see in the agent's code, it is a simple request-reply HTTP interaction.
I guess this problem is caused by an incorrect mapping. I added a field to the template. Because this field has multiple formats (such as string, json), Elasticsearch did not map it correctly. Apm-server sent it to The es message should not have received a successful signal, so it has been waiting in a loop.
The above is based on my guess.