I'm a beginner in ELK topic and really need help! I have a cluster with 6 nodes (3 master, 1 client and 2 data nodes). The heap configuration of each node is as below:
master: heap 2g (half of physical RAM as recommended)
client: heap 8g
data heap 4g
The number of documents has been increasing recently and I see the heap memory of all nodes are increasing as well. One of the three master nodes failed-over when the heap reached 100%. So I have few things that need to be cleared:
Do I need to restart the service on failed master node? Of course another has take over the role, but what happens to that failed node? Could it automatically be recovered? I ask this since it happens quite regularly, almost every 2-3 days
How to reduce the heap used? I did disable swapping, reindexed to fewer primary shards but that does not help. I have about ~90GB data a day.
The details are not hugely important, but the key missing line in 6.5.1 is this one. As it says in the issue, this caused
effectively a terrible memory leak
The version you're on is affected. Moreover it can particularly affect the master node. I would encourage you to upgrade to the latest version and monitor the situation to see if the problem persists.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.