ECK - all the nodes leaving the cluster often

in our eck cluster , all data node and master nodes leaving often with node left messages and follower retry count exceeded. from metrics we could see there is no evidence of system overload or cpu overloaded or heap memory problems .. What could be the reason ?

Below is the type of log we see often in elected master node where it mentions about follower rertry count exceeded. I can confirm there is no network issues in the kubernetes cluster , as there are other ECK cluster running in same kubernetes cluster with good health and no issues

{"type": "server", "timestamp": "2021-07-04T13:30:03,227Z", "level": "INFO", "component": "o.e.c.s.MasterService", "cluster.name": "ocpplatform", "node.name": "ocpplatform-es-ocpplatform-master-1", "message": "node-left[{ocpplatform-es-ocpplatform-hot-3}{eTi3MmEcTGGMLFKipYwbfg}{Kaxt-I5ETGSV18xwABVUuA}{192.168.34.8}{192.168.34.8:9300}{cdfhilstw} reason: followers check retry count exceeded, {ocpplatform-es-ocpplatform-warm-1}{xI1bNjz2SrSnQL_jVry7Dg}{28AKjT_QQwu1J20cfWIFIw}{192.168.29.8}{192.168.29.8:9300}{cdfhstw} reason: followers check retry count exceeded, {ocpplatform-es-ocpplatform-warm-2}{v4fAqquTRjmkTqcNNg6FUw}{YEE-XjpARQ-jG0e9K7FlIQ}{192.168.25.6}{192.168.25.6:9300}{cdfhstw} reason: followers check retry count exceeded], term: 281, version: 1138243, delta: removed {{ocpplatform-es-ocpplatform-warm-2}{v4fAqquTRjmkTqcNNg6FUw}{YEE-XjpARQ-jG0e9K7FlIQ}{192.168.25.6}{192.168.25.6:9300}{cdfhstw}, {ocpplatform-es-ocpplatform-hot-3}{eTi3MmEcTGGMLFKipYwbfg}{Kaxt-I5ETGSV18xwABVUuA}{192.168.34.8}{192.168.34.8:9300}{cdfhilstw}, {ocpplatform-es-ocpplatform-warm-1}{xI1bNjz2SrSnQL_jVry7Dg}{28AKjT_QQwu1J20cfWIFIw}{192.168.29.8}{192.168.29.8:9300}{cdfhstw}}", "cluster.uuid": "mN3lfJE5QmWKZYbHIBdMTg", "node.id": "bgDnb0_qQ9-_XFCwhY1IiA" }

Which versuon of Elasticsearch are you using? What is the specification of the cluster? What is the full output of the cluster stats API? What is in the Elasticsearch logs?

Issue was fixed.

problem identified with kibana index pattern that returns large number of indexes and field capability API calls made from KIBANA periodically to data nodes where flooded and made data nodes slower causing them to leave the cluster often

After deleting the index pattern , problem was resolved.

ES and kibana version is 7.13.1

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.