Hi Team,
Despite resolving the queue full issue on the ELK server, the APM service continues to fail intermittently. Our ELK server, hosted on a DigitalOcean droplet, is responsible for monitoring logs from the AWS environment.
Initially, the problem surfaced with a queue fill error due to the file system reaching its capacity. This led to Kibana console unavailability. To mitigate this, I manually cleared some indices on the ELK server and further managed them via the console.
Subsequently, I restarted both the ELK and APM services to ensure proper functionality. However, the APM service persists in failing after a short period of operation. Additionally, I noticed that APM logs were not updating prior to this incident.
Upon inspecting the Elasticsearch logs, we observed the following:
[2024-03-06T22:08:27,023][INFO ][o.e.x.s.a.AuthenticationService] [node-1] Authentication of [apm_system] was terminated by realm [reserved] - failed to authenticate user [apm_system]
Your prompt attention to this matter is greatly appreciated. Please let me know if you require any further details
Kinldy suggest the way to debug and fix