Elasticsearch output multiple hosts but no fault tolerance

Mahdi_Moazami · June 2, 2024, 7:57am

Hi Elastic team,
We've designed an Elasticsearch cluster with 3 nodes and there is 2 independent Logstash instances that ingest data to this cluster. sometimes one of the ES cluster's node downs due to high load and can't be restarted automatically, and this causes the Logstash instances start logging Error messages about unavailability of that node and so no data continue ingesting.

my question is that isn't it sufficient for fault tolerance purposes to set an array of hosts for Elasticsearch output as below:

hosts => ["https://a.b.c.x:9200", "https://a.b.c.y:9200", "https://a.b.c.z:9200"]

leandrojmp · June 2, 2024, 1:37pm

Setting an array of hosts should be enough as Logstash will load balance between them, it will show errors for the node the is down, but will send to the others.

Can you share any logs when this happens?

Mahdi_Moazami · June 3, 2024, 8:56am

As you mentioned, I checked the logs again and see what you said about the logs when a node is down. yes the logs are just warnings (and some info) about connectivity issue to the node (and pipelines continue to work with other hosts!).

Just another question is that when a node goes down, in the first minutes of connection loss, there are some Error logs about bulk requests failure for failed node. it's absolutely normal to see these errors cause some inflight events were sent to that node, but how is the logstash behaviour in these scenarios, I mean does it retry sending those actions to the same failed node or it uses fault tolerance mechanism and retry on others hosts provided?

The Error logs are as follows:

[2024-06-03T08:40:39,627][ERROR][logstash.outputs.elasticsearch][ALL_AdHoc_N][0083d642cf9d6a024e81fa4f82353b9eee6e25041e001364a0fbc90c2c40e054] Attempted to send a bulk request but Elasticsearch appears to be unreachable or down {:message=>"Elasticsearch Unreachable: [https://X.Y.Z.67:9200/_bulk?filter_path=errors,items.*.error,items.*.status][Manticore::SocketTimeout] Read timed out", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>2}

Topic		Replies	Views
Fault tolerance not working when multiple hosts Logstash	1	474	May 24, 2019
Multiple hosts in ES output - one of them down Logstash	1	641	January 20, 2020
Logstash output for multiple elasticsearch instance Logstash	9	32392	July 6, 2017
Elasticsearch Output: Node outages Logstash	2	358	July 6, 2017
Logstash, how to avoid output block when one elasticsearch node go down Logstash	7	1264	October 16, 2020

Elasticsearch output multiple hosts but no fault tolerance

Related topics