Similar in nature to this:
In 7.X I have a scenario where the quorum is lost due to a restart of all the nodes at the same time. All the data is still present, so cluster state is OK.
What's the solution to working around this? I have masters constantly trying to connect to each other but failing:
[2020-03-26T05:08:41,583][INFO ][o.e.c.c.JoinHelper ] [elasticsearch-es-master-1] failed to join {elasticsearch-es-master-1}{KG71sIO_TAOPHDCnwVdxRw}{8r3SI6OlRPKF3jfr5A81zw}{10.244.21.61}{10.244.21.61:9300}{box_type=hot} with JoinRequest{sourceNode={elasticsearch-es-master-1}{KG71sIO_TAOPHDCnwVdxRw}{8r3SI6OlRPKF3jfr5A81zw}{10.244.21.61}{10.244.21.61:9300}{box_type=hot}, optionalJoin=Optional[Join{term=94, lastAcceptedTerm=93, lastAcceptedVersion=5099765, sourceNode={elasticsearch-es-master-1}{KG71sIO_TAOPHDCnwVdxRw}{8r3SI6OlRPKF3jfr5A81zw}{10.244.21.61}{10.244.21.61:9300}{box_type=hot}, targetNode={elasticsearch-es-master-1}{KG71sIO_TAOPHDCnwVdxRw}{8r3SI6OlRPKF3jfr5A81zw}{10.244.21.61}{10.244.21.61:9300}{box_type=hot}}]}
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-master-1][10.244.21.61:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: node is no longer master for term 95 while handling publication
at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1012) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:252) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:238) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.2.0.jar:7.2.0]
It seems they can't establish quorum and appear to be all over the place right now.