in our eck cluster , all data node and master nodes leaving often with node left messages and follower retry count exceeded. from metrics we could see there is no evidence of system overload or cpu overloaded or heap memory problems .. What could be the reason ?
Below is the type of log we see often in elected master node where it mentions about follower rertry count exceeded. I can confirm there is no network issues in the kubernetes cluster , as there are other ECK cluster running in same kubernetes cluster with good health and no issues
{"type": "server", "timestamp": "2021-07-04T13:30:03,227Z", "level": "INFO", "component": "o.e.c.s.MasterService", "cluster.name": "ocpplatform", "node.name": "ocpplatform-es-ocpplatform-master-1", "message": "node-left[{ocpplatform-es-ocpplatform-hot-3}{eTi3MmEcTGGMLFKipYwbfg}{Kaxt-I5ETGSV18xwABVUuA}{192.168.34.8}{192.168.34.8:9300}{cdfhilstw} reason: followers check retry count exceeded, {ocpplatform-es-ocpplatform-warm-1}{xI1bNjz2SrSnQL_jVry7Dg}{28AKjT_QQwu1J20cfWIFIw}{192.168.29.8}{192.168.29.8:9300}{cdfhstw} reason: followers check retry count exceeded, {ocpplatform-es-ocpplatform-warm-2}{v4fAqquTRjmkTqcNNg6FUw}{YEE-XjpARQ-jG0e9K7FlIQ}{192.168.25.6}{192.168.25.6:9300}{cdfhstw} reason: followers check retry count exceeded], term: 281, version: 1138243, delta: removed {{ocpplatform-es-ocpplatform-warm-2}{v4fAqquTRjmkTqcNNg6FUw}{YEE-XjpARQ-jG0e9K7FlIQ}{192.168.25.6}{192.168.25.6:9300}{cdfhstw}, {ocpplatform-es-ocpplatform-hot-3}{eTi3MmEcTGGMLFKipYwbfg}{Kaxt-I5ETGSV18xwABVUuA}{192.168.34.8}{192.168.34.8:9300}{cdfhilstw}, {ocpplatform-es-ocpplatform-warm-1}{xI1bNjz2SrSnQL_jVry7Dg}{28AKjT_QQwu1J20cfWIFIw}{192.168.29.8}{192.168.29.8:9300}{cdfhstw}}", "cluster.uuid": "mN3lfJE5QmWKZYbHIBdMTg", "node.id": "bgDnb0_qQ9-_XFCwhY1IiA" }