We have integrated elastic-apm-node for our express app and there are couple of issues we are facing with respect to transaction and logging.
Random connectivity error we get continously in application logs, like below-
- APM Server transport error (503): Unexpected APM Server response when polling config\n{\"ok\":false,\"message\":\"The requested resource is currently unavailable.
- APM Server transport error (502): Unexpected APM Server response\n{\"ok\":false,\"message\":\"The instance rejected the connection.
- APM Server transport error (502): Unexpected APM Server response\nPost \"https://172.22.9.247:18033/intake/v2/events\": use of closed network connection
- APM Server transport error (503): Unexpected APM Server response when polling config\n{\"ok\":false,\"message\":\"The requested resource is currently unavailable
We are getting these errors only for few apps while others work simultaniously
We have around 15-20 Lambda's using the elastic APM. We have started to send APM traces of our ECS container applications and we are getting metrics for 2 apps and from 3rd we are facing these issues.
The above count is not exact, some times even 2nd ECS application also gets these error, restarts or redeployments works some times.
So even I assume its resource issue. Where exactly can i check this resource utilisation on mu elastic cloud? Although the health on my cluster shows as healthy.
You could check the data volumes on the APM index using the _cat API. You could also estimate your storage needs for the application set based on your current sampling using the approximations here.
Otherwise I would try scaling up your APM server or adding another in your configuration to see if that solves the issue. It is a bit of guess work and trial and error of some of the suggestions in the prior error links in this case as it's intermittent.
As per your suggestion i have scaled up the APM server from 1GB to 2GB and still seeing these logs
-APM Server transport error (400): Unexpected APM Server response\nAPM Server accepted 0 events in the last request\nError: read tcp 172.17.0.11:8200->172.22.4.114:52620: i/o timeout
The last error that you mentioned seems different, as it indicates a 400 error rather than a 5xx error. Could you check the apm-server logs for any errors and share here (redacting any sensitive info).
Apart from the 400 error, i am still getting 5xx errors as well
-APM Server transport error (502): Unexpected APM Server response\nPost "https://172.22.9.118:18757/intake/v2/events\": use of closed network connection
Also I dont see any logs for agent, says logs are not enabled in agent policy. How to enable them?
Is the APM Server running on Elastic Cloud or self hosted?
For Elastic Cloud, you can navigate to the cloud console - Monitoring - Enable Logs & Metrics, more information in the docs.
For self hosted, see monitoring the elastic-agent.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.