<!--
** Please read the guidelines below. **
1. GitHub is reserved for b…ug reports and feature requests. The best place to
ask a general question is at the Elastic [forums](https://discuss.elastic.co).
GitHub is not the place for general questions.
2. Please fill out EITHER the feature request block or the bug report block
below, and delete the other block.
-->
## Proposal
**Use case. Why is this important?**
## Bug Report
**What did you do?**
kill master pods more than half
**What did you expect to see?**
when master pods started, cluster restore to normal.
**What did you see instead? Under which circumstances?**
master pods running, but cluster is unhealth.
**Environment**
* ECK version:
eck version 1.6
* Kubernetes information:
insert any information about your Kubernetes environment that could help us:
* On premise ? VM runs on kvm
* Cloud: GKE / EKS / AKS ? No
* Kubernetes distribution: kubespray
for each of them please give us the version you are using
```
$ kubectl version
```
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-13T02:40:46Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:32:49Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}
* Resource definition:
```
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: es-v7132
spec:
version: 7.13.2
http:
service:
spec:
type: LoadBalancer
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: masters
count: 3
config:
node.roles: ["master","data","ingest"]
# before 7.9.0, use below
# node.master: true
# node.data: true
# node.ingest: true
# node.store.allow_mmap: false
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms4g -Xmx4g
resources:
requests:
memory: 6Gi
cpu: 0.5
limits:
memory: 6Gi
cpu: 2
volumeClaimTemplates:
- metadata:
name: es-v7132-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
```
* produce the proble:
After all master started. I worked fine.
kill the two pods:
`kubectl delete pod es-v7132-es-masters-1 es-v7132-es-masters-2`
the two pods will restart, but the cluster discover is failed. Cluster can't work as normal.
curl -u elastic:$PASSWD es-http-svc:9200/_cat/nodes?v
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
* logs:
pod es-v7132-es-masters-0 :
{"type": "server", "timestamp": "2021-07-12T08:28:09,235Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-v7132", "node.name": "es-v7132-es-masters-0", "message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [qxvFr7unQji6E3JVBDrOKQ, OzuTW949R1C9N1LHsHktUg, V8FWuqLFSNyy7GqvIuVo5A], have discovered [{es-v7132-es-masters-0}{OzuTW949R1C9N1LHsHktUg}{euLJjwHfTxGw3DfQmWBPQg}{10.233.96.222}{10.233.96.222:9300}{dim}, {es-v7132-es-masters-2}{MQJtnhEIRk63YY9uMgBSJA}{zbBssJK9QDmCusEucfNOfg}{10.233.92.215}{10.233.92.215:9300}{dim}, {es-v7132-es-masters-1}{WLiiCtvVT_qeMFXMjPZS1w}{HfK-5srjSimg9rTakR_yHw}{10.233.90.164}{10.233.90.164:9300}{dim}] which is not a quorum; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.233.90.163:9300, 10.233.92.214:9300] from hosts providers and [{es-v7132-es-masters-1}{qxvFr7unQji6E3JVBDrOKQ}{m0O6ZGLhRtS9dyAYk89HPg}{10.233.90.163}{10.233.90.163:9300}{dim}, {es-v7132-es-masters-2}{V8FWuqLFSNyy7GqvIuVo5A}{rfyW9KfGTP-pum6H7oX9Qg}{10.233.92.214}{10.233.92.214:9300}{dim}, {es-v7132-es-masters-0}{OzuTW949R1C9N1LHsHktUg}{euLJjwHfTxGw3DfQmWBPQg}{10.233.96.222}{10.233.96.222:9300}{dim}] from last-known cluster state; node term 9, last-accepted version 241 in term 9", "cluster.uuid": "JawL-M1PSNW5UjKCxyLa9A", "node.id": "OzuTW949R1C9N1LHsHktUg" }
The restarted Pods like that:
{"type": "server", "timestamp": "2021-07-12T08:42:03,856Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-v7132", "node.name": "es-v7132-es-masters-1", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{es-v7132-es-masters-1}{WLiiCtvVT_qeMFXMjPZS1w}{HfK-5srjSimg9rTakR_yHw}{10.233.90.164}{10.233.90.164:9300}{dim}, {es-v7132-es-masters-0}{OzuTW949R1C9N1LHsHktUg}{euLJjwHfTxGw3DfQmWBPQg}{10.233.96.222}{10.233.96.222:9300}{dim}, {es-v7132-es-masters-2}{MQJtnhEIRk63YY9uMgBSJA}{zbBssJK9QDmCusEucfNOfg}{10.233.92.215}{10.233.92.215:9300}{dim}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.233.92.215:9300, 10.233.96.222:9300] from hosts providers and [{es-v7132-es-masters-1}{WLiiCtvVT_qeMFXMjPZS1w}{HfK-5srjSimg9rTakR_yHw}{10.233.90.164}{10.233.90.164:9300}{dim}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }