I am using an on-prem 8.11.3 elastic+logstash stack.
logstash uses elasticsearch output plugin to post events to elastic.
current set-up uses 2 servers.
Recently, I got an incident with one of these 2 elastic servers, elastic service failed and stayed down.
During this elastic service downtime on one of the servers, logstash constantly tried to connect to it and failed posting events leading to events’ lost.
My question is how to make this more robust to avoid loosing events ?
What do you mean by downtime? If the output doesn’t have a connection to the target elasticsearch server it shouldn’t be trying to post anything and therefore should not complain about failing to post events.
Please provide much more detail on why you think you are losing events.
By elastic service downtime, I meant when elasticsearch.service was in failed state and was not running on srv1.
During this downtime, logstash wrote this kind of message every second to its logfiles:
[2026-03-11T00:00:01,406][INFO ][logstash.outputs.elasticsearch][main] Failed to perform request {:message=>"Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused", :exception=>Manticore::SocketException, :cause=>#<Java::OrgApacheHttpConn::HttpHostConnectException: Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused>} [2026-03-11T00:00:01,407][WARN ][logstash.outputs.elasticsearch][main] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"``https://srv1:9200/``", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :message=>"Elasticsearch Unreachable: [``https://srv1:9200/``][Manticore::SocketException] Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused"}
And I found entries in application logfiles I did not find in elasticsearch.
After re-starting elasticsearch.service on srv1, the node re-joined the cluster and situation was restored.
What was down? Just one server of the cluster or the entire cluster? Can you provide more context about your Elasticsearch cluster? How many nodes do you have etc.
Your Logstash output has 2 servers, if one of them is down it would use to the other and you should not lose any data unless the entire cluster was down.
Also, provide context about your logstash pipelines, what are your inputs?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.