Why is there a lot of "Waiting 30s" in the service log?

Waiting 30s... Wait time is taken from max-age directive in Cache-Control header in APM Server's response. dbgIterationsCount: 37436.
Waiting 30s... Wait time is taken from max-age directive in Cache-Control header in APM Server's response. dbgIterationsCount: 36.
Waiting 30s... Wait time is taken from max-age directive in Cache-Control header in APM Server's response. dbgIterationsCount: 27.

Today, when I checked the logs collected by filebeat, I found that many service logs showed "wait" messages. Why? I did not find a solution in "Troubleshoot Common problems"

I think you should read this discussion here from last year:

@riferrei

Thank you for your reply. I read this discussion and didn't understand that my error has any connection with it. What I want to confirm is what does "Waiting 30s" mean? The apm-server queue is full?

It may be, but we can't know for sure without proper debugging. If you could, please run the APM Server with the options -e -d "*" to increase verboseness in the details of the log. Also, check if there is a load balancer in-between the agent and the APM server — sometimes, these timeouts may be caused by unbalanced timeouts.

For debugging, it may be a little troublesome, the apm-server is running on the kubernetes cluster.
The agent reaches the apm-server through the request domain name (nginx forwarding), I checked the nginx log, and there is no exception.

Next, I will spend a little time observing.

Ah, that's interesting. Check the timeout configured for NGINX and the APM Server. Ideally, your load balancer timeout should be between the agent timeout and the APM Server timeout — and certainly smaller than the APM Server. For example:

APM agent (10s) :arrow_right: Load Balancer (15s) :arrow_right: APM Server (30s)

@riferrei

I intercepted a piece of debug log, there is "done send ack" in the log, it is obvious that the data has been sent to es, but no new data appeared on es.

https://paste.ubuntu.com/p/kNt9BWVdvV/

Paste the service log

https://paste.ubuntu.com/p/yhXJBJTdwM/

According to your service log, this message represents the .NET agent trying to poll data from the APM Server and timing out.

dbgIterationsCount represents the number of attempts to complete the data polling but failed. Each attempt increments the counter and then schedules another attempt. This means that something is going on between the agent and the APM Server at a network level. If I were you; I would start capturing some network packets from this communication to understand further what is happening. As you can see in the agent's code, it is a simple request-reply HTTP interaction.

@riferrei

Okay, I will try to capture and analyze.

I checked the captured data, but I don't know how to analyze it. Can you take a look?
https://drive.google.com/file/d/1ulOd19IN4a5YE3IbeSz82zaIAb0Cqkqo

I guess this problem is caused by an incorrect mapping. I added a field to the template. Because this field has multiple formats (such as string, json), Elasticsearch did not map it correctly. Apm-server sent it to The es message should not have received a successful signal, so it has been waiting in a loop.
The above is based on my guess.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.