We just have an issue on our cluster :
1 Coordinator Nodes
3 Master Eligible Nodes
Version 6.1.2
Yesterday, around 16PM our cluster stopped working well.
In the logs files of our coordinator node we had :
[2019-02-27T17:46:51,620][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
It was like Logstash wasn't able to connect to the Elasticsearch Data Nodes.
The GarbageCollector was taking a lot of time, and ressources, that's why Logstash output wasn't working.
We have rebooted all the Data Nodes, one by one and everything returned OK after that.
At 16PM I was requesting on Kibana and Timelion. I might have choose a large time (many days). Do you think it can be the root cause ?
It's sound like it was a Java memory leak ? Was it ?
It's certainly possible that you executed a query that overloaded the cluster, and also possible that giving your nodes more heap will let them cope with the situation better.
I did wonder if there was perhaps an excess of shards, but 560 seems reasonable for a cluster with 38GB of heap.
Sorry, without something like a heap dump we can only really speculate on what was consuming all the memory. If it happens again, grab one before restarting the nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.