I am using an on-prem 8.11.3 elastic+logstash stack.
logstash uses elasticsearch output plugin to post events to elastic.
current set-up uses 2 servers.
Recently, I got an incident with one of these 2 elastic servers, elastic service failed and stayed down.
During this elastic service downtime on one of the servers, logstash constantly tried to connect to it and failed posting events leading to events’ lost.
My question is how to make this more robust to avoid loosing events ?
What do you mean by downtime? If the output doesn’t have a connection to the target elasticsearch server it shouldn’t be trying to post anything and therefore should not complain about failing to post events.
Please provide much more detail on why you think you are losing events.
By elastic service downtime, I meant when elasticsearch.service was in failed state and was not running on srv1.
During this downtime, logstash wrote this kind of message every second to its logfiles:
[2026-03-11T00:00:01,406][INFO ][logstash.outputs.elasticsearch][main] Failed to perform request {:message=>"Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused", :exception=>Manticore::SocketException, :cause=>#<Java::OrgApacheHttpConn::HttpHostConnectException: Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused>} [2026-03-11T00:00:01,407][WARN ][logstash.outputs.elasticsearch][main] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"``https://srv1:9200/``", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :message=>"Elasticsearch Unreachable: [``https://srv1:9200/``][Manticore::SocketException] Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused"}
And I found entries in application logfiles I did not find in elasticsearch.
After re-starting elasticsearch.service on srv1, the node re-joined the cluster and situation was restored.
What was down? Just one server of the cluster or the entire cluster? Can you provide more context about your Elasticsearch cluster? How many nodes do you have etc.
Your Logstash output has 2 servers, if one of them is down it would use to the other and you should not lose any data unless the entire cluster was down.
Also, provide context about your logstash pipelines, what are your inputs?
after searching for a solution in logstash/elastic, I come to conclusion the best way is to implement a local reverse proxy to manage traffic between logstash and elastic.
As I said, one of the 2 elasticsearch.service was down and in failed state. Reason is unknown. It just crashed. Which is not an issue, the cluster was in perfect shape.
But still, logstash continued trying to connect to the failed elasticsearch instance.
I think nginx will better manage this than logstash who seems to not manage it at all.
If your cluster was up and just one of the nodes was offline, Logstash should still use the other node to send the data, unless it was unable to connect to it for other reasons.
You will need to troubleshoot this further, I don ´t think a reverser proxy between Logstash and Elasticsearch will do anything else than add another piece in the puzzle, Logstash manages the load balance itself.
You didn't provide any context about your pipelines, so it is very complicated to try to understand what may happened without it.
elasticsearch output using 2 serveurs sending events to different datastreams
About troubleshooting, in elasticsearch log, there is no error message explaining why it crashed/stopped. elasticsearch GC log stopped a couple of minutes before logstash started complaining elastic is not reachable anymore.
So, I have no information in logs explaining why it crashed. The only think I have is logstash logs recording many and many tries attempting to resurrect connection every second during the all downtime.
and correctly configured logstash, you should not lose events if only one of those 2 endpoints is temporarily unavailable. You will see logs telling you that the endpoint is down, how many logs would be determined by your specfiic logging configuration.
I will just add a couple of Gb to the VM’s to avoid the same in future.
And following your recommendations, I will keep current set up “as is” with logstash using at least 2 elasticsearch servers in “hosts” parameters.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.