Intake very slow (more than 10secondes)

ebuildy · April 8, 2021, 7:13am

Elastic stack version 7.12.0 with Platinium licence

APM Agent language and version: nodeJS - 3.X (latest)

Run on k8s.

Our nodeJS application complains about timeout when sending APM events to intake API. I setup instrumentation on apm-server, and see POST /intake/v2/events can take more than 10s!

I dont see any trouble on ES cluster, thread pools are fine.

Our apm-server.yml config file:

gist.github.com

https://gist.github.com/ebuildy/119beca1063f3915201bd62e5b821276

apm-server.yml

    http:
      enabled: true
      port: 5066
    monitoring:
      enabled: false
    logging:
      level: warning
      to_files: false
      to_stderr: true

This file has been truncated. show original

In APM-server logs, I can see sometimes "queue is full".

We use "stack monitoring", on an extra ES cluster:

I am not an expert on APM-server (yet!) , but I was thinking intake API should be very fast, with a queue system between in and output?

I suspect long GC:

trentm · April 8, 2021, 3:15pm

@ebuildy Hi! I don't know for sure from your data, but it is possible this is due to an issue in the Node.js APM agent: Blocking Behavior under Benchmarking Load · Issue #136 · elastic/apm-nodejs-http-client · GitHub
That issue is "fixed", but is not yet in a released APM agent. The work to get it into an APM agent release is here: fix: blocking behaviour under load by trentm · Pull Request #2024 · elastic/apm-agent-nodejs · GitHub

This Node.js APM agent issue can happen when the app using the agent is under fairly high load and/or the APM server is being slow or non-responsive. If possible, you could try this branch of the APM agent to see if that helps: GitHub - elastic/apm-agent-nodejs at trentm/blocking-behavior

I don't know apm-server that well, so I don't know if there could also be a server-side issue here. Seeing "queue is full" is the apm-server logs suggests that yes, there might be.

trentm · April 8, 2021, 3:17pm

Those requests taking 10s might be normal behaviour. They are long-running requests that an APM agent can keep open while sending up data with a chunked-encoding. The agent's apiRequestTime (Configuration options | APM Node.js Agent Reference [4.x] | Elastic) config var defaults to 10s. That is the time after which it will close an intake request to the APM server and start a new one.

ebuildy · April 8, 2021, 3:42pm

Ho, ok , didnt know about long live connection.

So nodeJS agent settings serverTimeout must be > apiRequestTime , ok I got it !

trentm · April 8, 2021, 4:21pm

That is the best practice yes. It is slightly more subtle: the serverTimeout is a timeout on socket inactivity, so it will get reset if there is any data being sent through (like APM transactions or spans, or the agent's regular metricsets reporting). Also the agent is gzip'ing the data it sends so there is some buffering happening there as well.

system · April 29, 2021, 12:22pm

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
APM Server transport error: intake response timeout: APM server did not respond within 10s of gzip stream finish APM nodejs , server	5	933	February 1, 2024
APM server /intake/v2/events return 202 but timed too long APM go , server	4	1097	May 12, 2021
Application high response time with Nodejs agent APM nodejs	1	564	January 16, 2020
Nodejs APM not working due to /intake/v2/events timeout in version 7.14 APM docker , nodejs , server	10	1712	October 14, 2021
Peridiocally Java APM Agent experiences errors with connection to APM server APM java	7	840	August 10, 2023

Intake very slow (more than 10secondes)

Related topics