We have 7 node elastic cluster out of which 5 are master eligible nodes and 2 are data nodes.
We have currently 13.8 Tb data with total 682 shards.Each index has 5 shards and 1 replica each node with 5 TB HDD,64 GB RAM ,32 GB Heap each.but some times when we execute querys elastic search is failed in 4 master nodes and 1 data node.Getting the following errors
ERROR :Due to collector [index-stats] timed out when collecting data.
ERROR: [node-3] fatal error in thread [elasticsearch[node-3][refresh][T#2]], exit ing
Elasticsearch service is not self restarting eventhough we have enabled service after reboot like daemon reload etc.Due to that master not discovered .Everytime we have to restart service manually in each node after that only cluster is reforming.Why it is happening eventough we have enough heapmemory 7*32 GB = 224 GB and currently each shard have around 20 GB which is normal