Troubleshoot Elastic Endpoint Unhealthy

leandrojmp · October 6, 2023, 3:45pm

Hello,

We are doing a PoC with the Elastic Agent and one of our agent host in this scenario became UNHEALTHY after an upgrade.

We have the following ingestion flow:

Elastic Agent -> HAProxy (passthrough) -> Logstash -> Elasticsearch

And currently we have 3 different policies, one for Linux workstations, one for Linux servers, and one for Windows workstations and is this last one that is not working right.

I requested the diagnostics.zip file for this agent and looking at the endpoint service log it says that it cannot connect to the Logstash server, which does not make much sense because no change was made on the network.

The error is not helpful at all:

{"@timestamp":"2023-10-06T14:47:30.6465521Z","agent":{"id":"03ef0b8d-2d54-4d72-94a7-70189dae65d0","type":"endpoint"},"ecs":{"version":"1.11.0"},"log":{"level":"error","origin":{"file":{"line":662,"name":"LogstashClient.cpp"}}},"message":"LogstashClient.cpp:662 SSL handshake with Logstash server at HAPROXY-IP:5046 encountered an error: (null)","process":{"pid":5172,"thread":{"id":7088}}}

It is complaining about SSL Handshake with the Logstash server and the error is (null), not sure what is happening.

This started after we upgraded the Agent from Fleet UI.

This same ingestion flow works for all the Linux machines, the difference in the policies are only the logstash port.

In the Endpoint screen in Kibana it says that the windows agent has an out-of-date policy, so I'm assuming something didn't worked as expected during the upgrade.

What path should I use to approach this troubleshoot?

wsouza · October 9, 2023, 1:39pm

Why don't you ingest the data directly into elasticsearch or instead of logstach and then elasticsearch? Are you using a self-signed certificate? You can try inserting the don't validate certificate tag in Elastic Agent. Another thing is to analyze, on the fleet server, whether there is also incompatibility in any integration of your policy.

leandrojmp · October 9, 2023, 2:03pm

We need to use Logstash, only Logstash servers are allowed to connect to the Elasticsearch servers, this is not an issue.

Everything worked fine, the issue only happens for a single Agent, the only one on Windows, after the Upgrade to version 8.10.2.

Since we have a license, we opened a ticket with elastic, it looks like some conflict with our VPN application, as it is intermitent.

wsouza · October 9, 2023, 2:34pm

Another possibility is to use wireshark to analyze traffic and try to understand the behavior of this communication. When executing the telnet iplogstash port command, is the connection closed normally?

leandrojmp · October 9, 2023, 2:41pm

It is not a connection issue, the connection works, a telnet works, the certificate works, only one agent running windows that has this issue after the upgrade.

It is intermitent and we are investigating a conflict with our VPN client, the agent seems to have some issue related to network.

Since I already opened a ticket I will mark this a concluded.

Thanks anyway @wsouza !

system · November 6, 2023, 2:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unhealthy status yet sending events - agent via logstash Elastic Agent fleet	7	617	January 25, 2023
Agent "Unhealthy". "Error while dialing open \\\.\\pipe\\elastic-agent-[...]" Elastic Security fleet	10	5316	December 2, 2021
Missing Elastic Security and endpoint integration data Elastic Security elastic-agent	16	1958	November 4, 2022
Elastic Agent (Fleet Deployment) behind a proxy Endpoint Security	7	2100	December 2, 2020
Elastic agent Unhealthy Endpoint Security fleet , elastic-agent	2	2726	September 9, 2022

Troubleshoot Elastic Endpoint Unhealthy

Related topics