Elected master node doesn't communicate with other master nodes within expected time

Describe the feature:

Elasticsearch version (bin/elasticsearch --version): 7.5.2

OS version (uname -a if on a Unix-like system): CentOs 7 - 3.10.0-1062.18.1.el7.x86_64

Description of the problem including expected versus actual behavior: Elected master node doesn't communicate with other master nodes within expected time. This is causing the re-election process and cluster turning RED

This issue keeps happening randomly all during the day, we are not sure what is actually causing "Master Not Discovered or Elected Yet". We first noticed this after adding 3 new Data nodes to our setup.

Below is the current setup

|no. of nodes| type| allocated heap |vm ram |cores|

|-----|-----|-----|----|----|

|3 |master nodes| 8gb |16gb |4|

|12| data nodes |16gb| 32gb |10|

|1| ingest node |8gb |18gb |4|

|2 |coordination nodes |16gb |32gb |10|

current storage: 10tb

number of shards: 5000

Are you seeing any indication of long GC on the master nodes?

We've been struggling with this issue for ~6 days and we definitely have seen gc of 1-4 seconds right before a crash but here are the most recent gc stats over the past 24 hours (none seem unusually long). During this timeframe we've seen ~3-4 blips/unstable moments for our cluster.

GC stats for the Active Master Node

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.