I'm a beginner in ELK topic and really need help! I have a cluster with 6 nodes (3 master, 1 client and 2 data nodes). The heap configuration of each node is as below:
master: heap 2g (half of physical RAM as recommended)
client: heap 8g
data heap 4g
The number of documents has been increasing recently and I see the heap memory of all nodes are increasing as well. One of the three master nodes failed-over when the heap reached 100%. So I have few things that need to be cleared:
- Do I need to restart the service on failed master node? Of course another has take over the role, but what happens to that failed node? Could it automatically be recovered? I ask this since it happens quite regularly, almost every 2-3 days
- How to reduce the heap used? I did disable swapping, reindexed to fewer primary shards but that does not help. I have about ~90GB data a day.
Thanks in advance