Logstash-output-elasticsearch load balancing not working when one of the nodes is down

preetish_P · October 21, 2021, 10:09am

Hi Team,

Currently we are working on negative testing of Elasticsearch multi-node clustering. Out current setup is:
node 1: Elasticsearch,logstash,kibana
node 2: Elasticsearch
node 3: Elasticsearch

the logstash on node 1 is pointing to all 3 nodes as below:

elasticsearch {
                    hosts => ["${ES_NODE_1}","${ES_NODE_2}","${ES_NODE_3}"]                    
                    index => "<index-name>"           
                    user => "${ES_USER_NAME}"
                    password => "${ES_USER_PASSWORD}"
                    ssl =>  true
                    cacert => "${ES_CERT_AUTH}"
                }

We have around 20 pipelines with all kinds of inputs (lumberjack, http_poller, jdbc).

We are observing that when one of the ES nodes is brought down, the pipelines having http_poller stop working (we call set of 3 APIs every minute). The ones based on lumberjack continue to work.

We continuously get this error in logstash-plain.log, which is expected:

[2021-10-19T17:43:02,646][WARN ][logstash.outputs.elasticsearch][<pipeline>] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"https://ES-NODE-1:9201/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [https://ES-NODE-1:9201/][Manticore::SocketException] Connection refused (Connection refused)"}
[2021-10-19T17:43:02,642][WARN ][logstash.outputs.elasticsearch][<module>] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [https://es-node-1:9201/][Manticore::SocketException] Connection refused (Connection refused) {:url=>https://es-node-1:9201/, :error_message=>"Elasticsearch Unreachable: [https://es-node-1:9201/][Manticore::SocketException] Connection refused (Connection refused)", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

We guessed that http_poller inputs might be getting starved of OS socket connections as all of them are being used up for internal healthcheck. We used below settings:

/etc/security/limits.conf:

logstash soft nofile 65536 
logstash hard nofile 65536

   resurrect_delay => 300
   retry_max_interval => 8

but still no luck.

Has anyone observed this behaviour ? is there any setting which we can assign to logstash output plugin to ensure it continues to work with one node down ?

Thanks

system · November 18, 2021, 10:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Ouptut to Elasticsearch when one node is down Elasticsearch	7	690	May 26, 2021
Logstash sends to elasticsearch cluster failed Logstash	3	266	April 10, 2020
Failure load balance Elasticsearch	5	328	August 20, 2021
Logstash, how to avoid output block when one elasticsearch node go down Logstash	7	1264	October 16, 2020
ElasticSearch No Available connections error in Logstash Logstash	2	2289	November 28, 2018

Logstash-output-elasticsearch load balancing not working when one of the nodes is down

Related topics