I have 2 logstash nodes with about 30 different pipelines sending data to a 6 node elastic cluster. For about half of my pipelines I am seeing a ton of errors like this in the /var/log/logstash/pipeline_mypipeline.log files:
[WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [https://node1.mynetwork.com:9200/_bulk][Manticore::SocketTimeout] Read timed out
and [ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request but Elasticsearch appears to be unreachable or down {:message=>"Elasticsearch Unreachable: [https://node1.mynetwork.com:9200/_bulk][Manticore::SocketTimeout] Read timed out"
My data is ingesting fine. I suppose once a node times out then logstash moves on to another one, but I would like to handle these messages. Some are labeled as "WARN" and others are "ERROR". I see the elasticsearch output has a timeout parameter. Bumping that up to 2 minutes instead of 1 might do the trick but that seems more like avoiding the problem instead of handling it.
I have a monitoring cluster set up. My cluster health is green, every node has plenty of free space, cpu is around 20%... For the logstash nodes their CPU is low (5%), JVM heap averages around 6.1GB out of 7.9GB.
What should I look at to address timeout warnings / errors like these?