Recovering from a clusterFormationFailure

I'm on version 7.1.1

background: I was trimming down the cluster for 11 to 5 nodes. I made sure all data was moved to 5 nodes. I took the empty nodes out.
But I might have made a mistake when I set the flag
"discovery.zen.minimum_master_nodes": "6"
to
"discovery.zen.minimum_master_nodes": "3"

I was trying to do that when there were only 5 nodes left. And I think the master was selected on a node that I was removing

Now I have an issue to restart the cluster

[o.e.c.c.ClusterFormationFailureHelper] 

[ip-192-168-22-26] master not discovered or elected yet, an election requires at least 6 nodes with ids from 

[fOP1wlD-SgqzJcw3LozPmg, kKWKmKfGRKCBoi-H3L4hLw, dj1O8xKfRhi9tcu4SRNE5A, KlhOzJs1TWaefm09yFNx_A, Qj8gfl-_SkyYvxy74WKSgg, bMEQFfqBQvabUmKwAav8_w, XLgq5xajRo2pO3SueUMR6A, _zlsYdxOQzmfR4pVY4Z7sA, rc_i49WTT9K8YGjkVzuC6g, 5oZHQoTrSgmj8m_CvOuhQg, GO4ORcIJSqaJMRmcl2FFEw], 


have discovered 
[
{ip-192-168-12-243}{ifaOHYGgRbSuSc9lKtTWEg}{W0byVuJLRLGLkLEpuFGnmw}
{192.168.12.243}
{192.168.12.243:9300}{aws_availability_zone=us-gov-west-1a}, 

{ip-192-168-22-28}{_zlsYdxOQzmfR4pVY4Z7sA}{fr4EKHGESH-V171OnO1VDw}
{192.168.22.28}
{192.168.22.28:9300}{aws_availability_zone=us-gov-west-1b},
{ip-192-168-12-208}{kKWKmKfGRKCBoi-H3L4hLw}{-g8LwZ2EQg-dRTTah50OsQ}
{192.168.12.208}
{192.168.12.208:9300}{aws_availability_zone=us-gov-west-1a}, 
{ip-192-168-12-157}{5oZHQoTrSgmj8m_CvOuhQg}{syB0AxzUSP6lEqFN5arv6w}
{192.168.12.157}
{192.168.12.157:9300}{aws_availability_zone=us-gov-west-1a}, 

{ip-192-168-22-54}{KlhOzJs1TWaefm09yFNx_A}{0QKxei6VT4K4gCGjFUXLug}
{192.168.22.54}{192.168.22.54:9300}{aws_availability_zone=us-gov-west-1b}] 


which is not a quorum; discovery will continue using [127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 192.168.12.157:9300, 192.168.22.54:9300, 192.168.22.26:9300, 192.168.22.28:9300, 192.168.12.243:9300, 192.168.12.208:9300] from hosts providers and [{ip-192-168-22-26}{dj1O8xKfRhi9tcu4SRNE5A}{oI0Jc8zdTpit6OlAcOOcEw}{192.168.22.26}{192.168.22.26:9300}{aws_availability_zone=us-gov-west-1b}] from last-known cluster state; node term 733, last-accepted version 63839 in term 730

Nodes are added via an auto scaling group. And using the AWS EC2 discovery.

How can I tell the cluster to just take 5 nodes (and not 6 nodes for the master election)

This setting is ignored in 7.x. The issue is that you have removed more than half of the master-eligible nodes all at once, which is not supported since it means you may have lost data: the latest cluster state might only be on the 6 nodes you removed.

The only safe way to proceed is to restore this cluster from a recent snapshot.

Thanks David

Indeed some old 6.x habits here. So what are the steps to restart the 5 nodes ?
I'm ok loosing all the data. I took a backup yesterday to S3

BTW: I' maintaining a 6.5.* cluster and a 7.* cluster
It would be nice to get an error when we set 'obsolete' parameters. Not sure how easy this is to do. (I'm sure not everybody works on a daily basis with the latest release)

PUT /_cluster/settings HTTP/1.1
Content-Type: application/json

{
  "transient": {
    "discovery.zen.minimum_master_nodes": "3"
  }
}

Just thinking about this a bit more.
I had all the shards moved to my 5 nodes. The 6 nodes that I was removing had no data. So no 'data' was lost. It would be nice if there is a way to tell the ClusterFormationFailureHelper to ignore certain state it keeps internally

The nodes you removed were master-eligible, so although they had no shards they held the metadata needed to correctly interpret the data held in your shards. Although it's stored redundantly to tolerate the loss of a minority of your master-eligible nodes it is not possible to tolerate the loss of more than half. Without that metadata you can get into some very strange data loss situations indeed. Best to start again: wipe all the nodes and start up a brand-new cluster.

Elasticsearch already emits warnings when you set deprecated parameters, both in the Warning response header and in the deprecation log.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.