Random connectivity error to APM

Kibana version: 8.8.1

Elasticsearch version: 8.8.1

APM Server version: 8.8.1

APM Agent language and version: NodeJs Agent elastic-apm-node": "^3.47.0

Configuration:

const apm = require('elastic-apm-node').start({
  serviceName: 'name',
  secretToken: config.app.apmSecretToken,

  // Set the custom APM Server URL (default: http://localhost:8200)
  serverUrl: config.app.elasticApmUrl,
  ignoreUrls: ['/', '/metrics'],
  captureBody: 'transactions',
  logLevel: 'debug',
  apiRequestSize: '1mb',
  environment: config.config_id
});

We have integrated elastic-apm-node for our express app and there are couple of issues we are facing with respect to transaction and logging.

Random connectivity error we get continously in application logs, like below-

- APM Server transport error (503): Unexpected APM Server response when polling config\n{\"ok\":false,\"message\":\"The requested resource is currently unavailable.
- APM Server transport error (502): Unexpected APM Server response\n{\"ok\":false,\"message\":\"The instance rejected the connection.
- APM Server transport error (502): Unexpected APM Server response\nPost \"https://172.22.9.247:18033/intake/v2/events\": use of closed network connection
- APM Server transport error (503): Unexpected APM Server response when polling config\n{\"ok\":false,\"message\":\"The requested resource is currently unavailable

We are getting these errors only for few apps while others work simultaniously

Hi @lalitprasanth,

Welcome to the community! How many applications do you have configured to send to Elastic at the same time?

I'm aware the error messages don't exactly match but I wonder if you are either hitting the APM server internal queue limit or are exceeding the number of requests that can be processed concurrently.

Hi @carly.richmond ,

Thanks for the quick reply.

We have around 15-20 Lambda's using the elastic APM. We have started to send APM traces of our ECS container applications and we are getting metrics for 2 apps and from 3rd we are facing these issues.

The above count is not exact, some times even 2nd ECS application also gets these error, restarts or redeployments works some times.

So even I assume its resource issue. Where exactly can i check this resource utilisation on mu elastic cloud? Although the health on my cluster shows as healthy.

You could check the data volumes on the APM index using the _cat API. You could also estimate your storage needs for the application set based on your current sampling using the approximations here.

Otherwise I would try scaling up your APM server or adding another in your configuration to see if that solves the issue. It is a bit of guess work and trial and error of some of the suggestions in the prior error links in this case as it's intermittent.

Hi @carly.richmond ,

As per your suggestion i have scaled up the APM server from 1GB to 2GB and still seeing these logs
-APM Server transport error (400): Unexpected APM Server response\nAPM Server accepted 0 events in the last request\nError: read tcp 172.17.0.11:8200->172.22.4.114:52620: i/o timeout

The last error that you mentioned seems different, as it indicates a 400 error rather than a 5xx error. Could you check the apm-server logs for any errors and share here (redacting any sensitive info).

Hi @simitt ,

Apart from the 400 error, i am still getting 5xx errors as well
-APM Server transport error (502): Unexpected APM Server response\nPost "https://172.22.9.118:18757/intake/v2/events\": use of closed network connection

Also I dont see any logs for agent, says logs are not enabled in agent policy. How to enable them?

Is the APM Server running on Elastic Cloud or self hosted?

For Elastic Cloud, you can navigate to the cloud console - Monitoring - Enable Logs & Metrics, more information in the docs.
For self hosted, see monitoring the elastic-agent.

Hope this helps!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.