Sniffing recover after all instances are down

We are using Logstash pipeline with Elasticsearch output plugin in a kubernetes environment.
We've added sniffing to the plugin:

...
output {
  elasticsearch {
    ...
    hosts => ["elasticsearch-service"]
    sniffing => true
  }
}

This causes the logstash to identify the actual Elasticsearch nodes behind the service, and send in parallel to each of them, which is great for performance.

The problem occurs when we shut down all Elasticsearch instances at once, and then restart the Elasticsearch instances, and they all get new different IPs.
Logstash, which already done the sniffing and no longer uses the service name specified in the plugin "hosts", but instead uses the actual Elasticsearch nodes IPs (which are all shutdown), can no longer connect to the new instances, and displays error:

Elasticsearch output attempted to sniff for new connections but cannot. No living connections are detected. Pool contains the following current URLs {:url_info=>{https://elastic:xxxxxx@192.168.154.7:9200/=>{:in_use=>0, :state=>:dead, :version=>"7.1.1", :last_error=>#<LogStash::Outputs::Elasticsearch::HttpClient::Pool::HostUnreachableError: Could not reach host Manticore::ClientProtocolException: 192.168.154.7:9200 failed to respond>, :last_errored_at=>2019-07-14 07:49:37 UTC}}}

I would have expected it to retry the original plugin "hosts", or at least to provide a configurable way to instruct the logstash to retry the plugin "hosts", in case all of the "sniffed" Elasticsearch instances are down.

A manual restart of the logstash solves this issue.

Is there a way to automatically overcome this issue, or to configure the logstash to retry the original hosts?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.