ElasticSearch No Available connections error in Logstash

I'm running a Logstash instance which is connected to an ES cluster behind a load balancer.
The load balancer has an idle timeout of 5 minutes.
Logstash is configured with the ES url corresponding to the loadbalancer ip.

Normally everything works fine, but what happens is that after a period of requests inactivity, the next request processed by LS goes in error with the following:

[2018-10-30T08:15:00,757][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://10.100.24.254:9200/, :error_message=>"Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-30T08:15:00,759][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-10-30T08:15:02,760][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-10-30T08:15:02,760][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-10-30T08:15:05,651][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://10.100.24.254:9200/, :path=>"/"}

LS eventually recovers, but it takes more than 1 min and this is not acceptable for our SLA.

I suspect that's due to the loadbalancer closing the connections after 5 min of inactivity.

I've tried setting:

timeout => 3

which makes things better. The request is retried after 3 secs, but this is still not good enough.
What's the best set of configuration options that I can use to make sure the connections are always healthy and working before the requests are attempted and so I experience no delay at all?

here please any suggestion?
I've tried setting validate_after_inactivity to a value slightly higher than the load balancer idle timeout. In addition, also added timeout and resurrect_delay to 3secs. The situation has improved, now it's quicker to detect the dead connection are retry. But it still takes a few seconds. I'd like to completely prevent those dead connections instead

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.