Hello,
We are doing a PoC with the Elastic Agent and one of our agent host in this scenario became UNHEALTHY after an upgrade.
We have the following ingestion flow:
Elastic Agent -> HAProxy (passthrough) -> Logstash -> Elasticsearch
And currently we have 3 different policies, one for Linux workstations, one for Linux servers, and one for Windows workstations and is this last one that is not working right.
I requested the diagnostics.zip
file for this agent and looking at the endpoint service log it says that it cannot connect to the Logstash server, which does not make much sense because no change was made on the network.
The error is not helpful at all:
{"@timestamp":"2023-10-06T14:47:30.6465521Z","agent":{"id":"03ef0b8d-2d54-4d72-94a7-70189dae65d0","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"error","origin":{"file":{"line":662,"name":"LogstashClient.cpp"}}},"message":"LogstashClient.cpp:662 SSL handshake with Logstash server at HAPROXY-IP:5046 encountered an error: (null)","process":{"pid":5172,"thread":{"id":7088}}}
It is complaining about SSL Handshake with the Logstash server and the error is (null), not sure what is happening.
This started after we upgraded the Agent from Fleet UI.
This same ingestion flow works for all the Linux machines, the difference in the policies are only the logstash port.
In the Endpoint screen in Kibana it says that the windows agent has an out-of-date policy, so I'm assuming something didn't worked as expected during the upgrade.
What path should I use to approach this troubleshoot?