ECK - all the nodes leaving the cluster often

Kannappan_Somu · July 4, 2021, 2:11pm

in our eck cluster , all data node and master nodes leaving often with node left messages and follower retry count exceeded. from metrics we could see there is no evidence of system overload or cpu overloaded or heap memory problems .. What could be the reason ?

Below is the type of log we see often in elected master node where it mentions about follower rertry count exceeded. I can confirm there is no network issues in the kubernetes cluster , as there are other ECK cluster running in same kubernetes cluster with good health and no issues

{"type": "server", "timestamp": "2021-07-04T13:30:03,227Z", "level": "INFO", "component": "o.e.c.s.MasterService", "cluster.name": "ocpplatform", "node.name": "ocpplatform-es-ocpplatform-master-1", "message": "node-left[{ocpplatform-es-ocpplatform-hot-3}{eTi3MmEcTGGMLFKipYwbfg}{Kaxt-I5ETGSV18xwABVUuA}{192.168.34.8}{192.168.34.8:9300}{cdfhilstw} reason: followers check retry count exceeded, {ocpplatform-es-ocpplatform-warm-1}{xI1bNjz2SrSnQL_jVry7Dg}{28AKjT_QQwu1J20cfWIFIw}{192.168.29.8}{192.168.29.8:9300}{cdfhstw} reason: followers check retry count exceeded, {ocpplatform-es-ocpplatform-warm-2}{v4fAqquTRjmkTqcNNg6FUw}{YEE-XjpARQ-jG0e9K7FlIQ}{192.168.25.6}{192.168.25.6:9300}{cdfhstw} reason: followers check retry count exceeded], term: 281, version: 1138243, delta: removed {{ocpplatform-es-ocpplatform-warm-2}{v4fAqquTRjmkTqcNNg6FUw}{YEE-XjpARQ-jG0e9K7FlIQ}{192.168.25.6}{192.168.25.6:9300}{cdfhstw}, {ocpplatform-es-ocpplatform-hot-3}{eTi3MmEcTGGMLFKipYwbfg}{Kaxt-I5ETGSV18xwABVUuA}{192.168.34.8}{192.168.34.8:9300}{cdfhilstw}, {ocpplatform-es-ocpplatform-warm-1}{xI1bNjz2SrSnQL_jVry7Dg}{28AKjT_QQwu1J20cfWIFIw}{192.168.29.8}{192.168.29.8:9300}{cdfhstw}}", "cluster.uuid": "mN3lfJE5QmWKZYbHIBdMTg", "node.id": "bgDnb0_qQ9-_XFCwhY1IiA" }

Christian_Dahlqvist · July 13, 2021, 4:54am

Which versuon of Elasticsearch are you using? What is the specification of the cluster? What is the full output of the cluster stats API? What is in the Elasticsearch logs?

Kannappan_Somu · July 13, 2021, 8:35pm

Issue was fixed.

problem identified with kibana index pattern that returns large number of indexes and field capability API calls made from KIBANA periodically to data nodes where flooded and made data nodes slower causing them to leave the cluster often

After deleting the index pattern , problem was resolved.

ES and kibana version is 7.13.1

system · August 10, 2021, 8:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data nodes left cluster Elasticsearch	2	617	September 5, 2021
Nodes randomly, temporarily, leaving 7.3.2 cluster Elasticsearch	17	5010	May 1, 2020
ELK node leaves and rejoin the cluster every hour Elasticsearch	4	1802	July 5, 2017
Nodes fall out of the cluster es 7.9.1 Elasticsearch	3	333	December 21, 2020
When half more master node down, cluster can't work as normal Elastic Cloud on Kubernetes (ECK)	3	421	August 23, 2021

ECK - all the nodes leaving the cluster often

Related topics