There are no living connections in the connection pool

#1

Hello,
because of network problems, I build a separate internal network only for ELK nodes. Network nodeXY-elk-local. This network is used for Elastic communication. On another network card is network intranet used for ingesting logs into Logstash or directly into Elastic. But still, have issue with ingesting logs.

I have a lot of these logs in logstash:

[2019-05-14T16:20:53,062][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2019-05-14T16:20:53,117][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2019-05-14T16:20:53,119][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}

I have 3 master nodes and 13 data nodes. Config of master looks like

network.host: ["master1-elk-local","localhost"]
http.host: ["localhost","master1-elk-local","master1-intranet"]

Data nodes:

network.host: ["datanode1-elk-local"]

Logstash on master nodes.

input {
  beats {
    port => 5044
  }
}

elasticsearch {
        hosts => ["localhost:9200"]
        index => "webserver-syslog-filebeat-%{+YYYY.MM.dd.HH}"
        sniffing => false
    }

Using ELK 5.6 and I am going to upgrade to ELK 6, But after solve this issue. I think that it should be Elastic configuration problem. But cannot find a solution. Before network change, everything worked well, but network problems between ELK nodes.

#2

Hey,

problem was solved at night. There was a stuck filebeat on one server. It tried to send years old logs (logrotated in 2015). Even if I use ignore_older: 3h. The solution was, that pkill all processes of filebeat (time to time, systemctl stop filebeat does't works), ignore some unimportant logs and start it back.

Weird is, that one filebebeat can paralyze that big cluster.