Logstash.outputs.elasticsearch - Failed to perform request: Connection refused

I am using an on-prem 8.11.3 elastic+logstash stack.
logstash uses elasticsearch output plugin to post events to elastic.
current set-up uses 2 servers.

elasticsearch {
hosts => [ "srv1:9200", "srv2:9200" ]
data_stream_dataset => "ds-dataset"
data_stream_namespace => "default"
ssl_enabled => true
ssl_certificate_authorities => "/etc/pki/tls/certs/ca-bundle.crt"
api_key => "localapi:key"
}

Recently, I got an incident with one of these 2 elastic servers, elastic service failed and stayed down.

During this elastic service downtime on one of the servers, logstash constantly tried to connect to it and failed posting events leading to events’ lost.

My question is how to make this more robust to avoid loosing events ?

What do you mean by downtime? If the output doesn’t have a connection to the target elasticsearch server it shouldn’t be trying to post anything and therefore should not complain about failing to post events.

Please provide much more detail on why you think you are losing events.

By elastic service downtime, I meant when elasticsearch.service was in failed state and was not running on srv1.

During this downtime, logstash wrote this kind of message every second to its logfiles:

[2026-03-11T00:00:01,406][INFO ][logstash.outputs.elasticsearch][main] Failed to perform request {:message=>"Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused", :exception=>Manticore::SocketException, :cause=>#<Java::OrgApacheHttpConn::HttpHostConnectException: Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused>}
[2026-03-11T00:00:01,407][WARN ][logstash.outputs.elasticsearch][main] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"``https://srv1:9200/``", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :message=>"Elasticsearch Unreachable: [``https://srv1:9200/``][Manticore::SocketException] Connect to srv1:9200 [srv1/10.1.1.1] failed: Connection refused"}

And I found entries in application logfiles I did not find in elasticsearch.

After re-starting elasticsearch.service on srv1, the node re-joined the cluster and situation was restored.

What was down? Just one server of the cluster or the entire cluster? Can you provide more context about your Elasticsearch cluster? How many nodes do you have etc.

Your Logstash output has 2 servers, if one of them is down it would use to the other and you should not lose any data unless the entire cluster was down.

Also, provide context about your logstash pipelines, what are your inputs?

It is a cluster of 5 servers.

logstash output elasticsearch plugin’s “hosts” parameter is set with 2 servers out of these 5.

Only one of the 2 servers got an elasticsearch.service in failed state, the other one was running.

And at cluster level, all other 4 nodes were up and running. Cluster was running fine.

logstash inputs are all coming from filebeat.