Hi, I am building a 5 nodes cluster using k8s, 3 master nodes, and 2 data nodes. When restart 3 masters orderly, got error "master not discovered or elected yet", the cluster can't work. This situation last several hours after I delete all master pods and rebuild pods.
{"type": "server", "timestamp": "2020-09-23T23:39:13,686Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "ems-search-000", "node.name": "ems-search-000-master-0", "message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [_q-piBceRT2bpdM4yFUxfw, CGCD0ENwT1O_KV8snCwlCg, 3D2zfFbRSqeRJivqWm7TSA], have discovered [{ems-search-000-master-0}{CGCD0ENwT1O_KV8snCwlCg}{RMoCPsq0QGy6rC2dWD44Yw}{192.168.76.3}{192.168.76.3:9300}{m}{xpack.installed=true, transform.node=false}, {ems-search-000-master-2}{D4i0wEiYQl2LGKzvfB47lQ}{BZAhYx1DRcSSLBEcnd2gqQ}{192.168.69.91}{192.168.69.91:9300}{m}{xpack.installed=true, transform.node=false}, {ems-search-000-master-1}{i02Las-eRnCAwHrR0Ej1_A}{j5XoXcCNR9iZm7EQ6IegBQ}{192.168.91.203}{192.168.91.203:9300}{m}{xpack.installed=true, transform.node=false}] which is not a quorum; discovery will continue using [192.168.69.91:9300] from hosts providers and [{ems-search-000-master-2}{3D2zfFbRSqeRJivqWm7TSA}{nYRzL6S_SqSmn634Q-dv9A}{192.168.82.66}{192.168.82.66:9300}{m}{xpack.installed=true, transform.node=false}, {ems-search-000-master-0}{CGCD0ENwT1O_KV8snCwlCg}{RMoCPsq0QGy6rC2dWD44Yw}{192.168.76.3}{192.168.76.3:9300}{m}{xpack.installed=true, transform.node=false}] from last-known cluster state; node term 17, last-accepted version 2822 in term 17", "cluster.uuid": "_5ixLn1CQz2EdGxe85TMTQ", "node.id": "CGCD0ENwT1O_KV8snCwlCg" }
Some config:
- name: cluster.initial_master_nodes
value: "elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,"
- name: discovery.seed_hosts
value: "elasticsearch-master-headless"
- name: cluster.name
value: "elasticsearch"
Master0, node Id: UIl8Fr4iSsGKjTy5DQaczg -> CGCD0ENwT1O_KV8snCwlCg
{"type": "server", "timestamp": "2020-09-23T23:39:13,686Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "ems-search-000", "node.name": "ems-search-000-master-0", "message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [_q-piBceRT2bpdM4yFUxfw, CGCD0ENwT1O_KV8snCwlCg, 3D2zfFbRSqeRJivqWm7TSA], have discovered [{ems-search-000-master-0}{CGCD0ENwT1O_KV8snCwlCg}{RMoCPsq0QGy6rC2dWD44Yw}{192.168.76.3}{192.168.76.3:9300}{m}{xpack.installed=true, transform.node=false}, {ems-search-000-master-2}{D4i0wEiYQl2LGKzvfB47lQ}{BZAhYx1DRcSSLBEcnd2gqQ}{192.168.69.91}{192.168.69.91:9300}{m}{xpack.installed=true, transform.node=false}, {ems-search-000-master-1}{i02Las-eRnCAwHrR0Ej1_A}{j5XoXcCNR9iZm7EQ6IegBQ}{192.168.91.203}{192.168.91.203:9300}{m}{xpack.installed=true, transform.node=false}] which is not a quorum; discovery will continue using [192.168.69.91:9300] from hosts providers and [{ems-search-000-master-2}{3D2zfFbRSqeRJivqWm7TSA}{nYRzL6S_SqSmn634Q-dv9A}{192.168.82.66}{192.168.82.66:9300}{m}{xpack.installed=true, transform.node=false}, {ems-search-000-master-0}{CGCD0ENwT1O_KV8snCwlCg}{RMoCPsq0QGy6rC2dWD44Yw}{192.168.76.3}{192.168.76.3:9300}{m}{xpack.installed=true, transform.node=false}] from last-known cluster state; node term 17, last-accepted version 2822 in term 17", "cluster.uuid": "_5ixLn1CQz2EdGxe85TMTQ", "node.id": "CGCD0ENwT1O_KV8snCwlCg" }
Master1, node ID: _q-piBceRT2bpdM4yFUxfw -> i02Las-eRnCAwHrR0Ej1_A
{"type": "server", "timestamp": "2020-09-23T23:39:22,274Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "ems-search-000", "node.name": "ems-search-000-master-1", "message": "master not discovered or elected yet, an election requires 2 nodes with ids [i02Las-eRnCAwHrR0Ej1_A, CGCD0ENwT1O_KV8snCwlCg], have discovered [{ems-search-000-master-1}{i02Las-eRnCAwHrR0Ej1_A}{j5XoXcCNR9iZm7EQ6IegBQ}{192.168.91.203}{192.168.91.203:9300}{m}{xpack.installed=true, transform.node=false},{ems-search-000-master-2}{D4i0wEiYQl2LGKzvfB47lQ}{BZAhYx1DRcSSLBEcnd2gqQ}{192.168.69.91}{192.168.69.91:9300}{m}{xpack.installed=true, transform.node=false}, {ems-search-000-master-0}{CGCD0ENwT1O_KV8snCwlCg}{RMoCPsq0QGy6rC2dWD44Yw}{192.168.76.3}{192.168.76.3:9300}{m}{xpack.installed=true, transform.node=false}] which is a quorum; discovery willcontinue using [192.168.69.91:9300, 192.168.76.3:9300] from hosts providers and [{ems-search-000-master-1}{i02Las-eRnCAwHrR0Ej1_A}{j5XoXcCNR9iZm7EQ6IegBQ}{192.168.91.203}{174.100.91.203:9300}{m}{xpack.installed=true, transform.node=false}] from last-known cluster state; node term 0, last-accepted version 0 in term 0"}
Master2, node ID: 3D2zfFbRSqeRJivqWm7TSA -> D4i0wEiYQl2LGKzvfB47lQ
{"type": "server", "timestamp": "2020-09-23T23:36:40,475Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "ems-search-000", "node.name": "ems-search-000-master-2", "message": "master not discovered or elected yet, an election requires 2 nodes with ids [D4i0wEiYQl2LGKzvfB47lQ, CGCD0ENwT1O_KV8snCwlCg], have discovered [{ems-search-000-master-2}{D4i0wEiYQl2LGKzvfB47lQ}{BZAhYx1DRcSSLBEcnd2gqQ}{192.168.69.91}{192.168.69.91:9300}{m}{xpack.installed=true, transform.node=false}, {ems-search-000-master-0}{CGCD0ENwT1O_KV8snCwlCg}{RMoCPsq0QGy6rC2dWD44Yw}{192.168.76.3}{192.168.76.3:9300}{m}{xpack.installed=true, transform.node=false}] which is a quorum; discovery will continue using [192.168.76.3:9300] from hosts providers and [{ems-search-000-master-2}{D4i0wEiYQl2LGKzvfB47lQ}{BZAhYx1DRcSSLBEcnd2gqQ}{192.168.69.91}{192.168.69.91:9300}{m}{xpack.installed=true, transform.node=false}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
Election ids for each master nodes:
master0: [_q-piBceRT2bpdM4yFUxfw, CGCD0ENwT1O_KV8snCwlCg, 3D2zfFbRSqeRJivqWm7TSA]
master1: [i02Las-eRnCAwHrR0Ej1_A, CGCD0ENwT1O_KV8snCwlCg]
master2: [D4i0wEiYQl2LGKzvfB47lQ, CGCD0ENwT1O_KV8snCwlCg]
From the log found nodes ids in each master node are not the same, ids of Master1 are the ids before restarted, and "master not discovered or elected yet" logs print all the time until I delete all pods and start new one. Why will this happen and how can I avoid this happen again?
And also found master0 and master2 do not receive 'added master1' messages, and master0 and master1 do not receive 'added master2' messages, How to solve this?