Hi,
I added a new data node to our cluster (all nodes running ES 2.4.1), and after the node has successfully joined the cluster, the cluster state is yellow because 5 replica shards from the .scripts
index are unassigned (they should be on this new data node).
GET .scripts/settings
shows this:
{
".scripts": {
"settings": {
"index": {
"number_of_shards": "5",
"auto_expand_replicas": "0-all",
"creation_date": "1461180578329",
"unassigned": {
"node_left": {
"delayed_timeout": "10m"
}
},
"number_of_replicas": "10",
"uuid": "lclh6JI_QsGUJxoYfL2N6g",
"version": {
"created": "2030199"
}
}
}
}
}
Looking at _cat/shards
, I see one of these for each of the 5 replica shards that are unassigned:
.scripts 2 r UNASSIGNED REPLICA_ADDED
And kopf shows the new node not getting any shards replicated to it (highlighted in red here):
The logs from the master server from when the new data node joined are:
[2017-02-03 12:02:30,917][INFO ][cluster.service ] [ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com] added {{ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com}{XXXXXXXXX}{XXX.XXX.XXX.XXX}{XXX.XXX.XXX.XXX:9300}{max_local_storage_nodes=1, aws_availability_zone=XXXXXX, tag=current, master=false},}, reason: zen-disco-join(join from node[{ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com}{XXXXXXXXXXX}{XXX.XXX.XXX.XXX}{XXX.XXX.XXX.XXX:9300}{max_local_storage_nodes=1, aws_availability_zone=XXXXXXXX, tag=current, master=false}])
[2017-02-03 12:03:00,927][WARN ][discovery.zen.publish ] [ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com] timed out waiting for all nodes to process published state [737391] (timeout [30s], pending nodes: [{ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com}{XXXXXXXXX}{XXX.XXX.XXX.XXX}{XXX-XXX.XXX.XXX:9300}{max_local_storage_nodes=1, aws_availability_zone=XXXXXX, tag=current, master=false}])
[2017-02-03 12:03:00,934][WARN ][cluster.service ] [ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com] cluster state update task [zen-disco-join(join from node[{ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com}{XXXXXXXX}{XXX.XXX.XXX.XXX}{XXX.XXX.XXX.XXX:9300}{max_local_storage_nodes=1, aws_availability_zone=XXXXXXX, tag=current, master=false}])] took 30s above the warn threshold of 30s
[2017-02-03 12:03:00,935][INFO ][cluster.metadata ] [ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com] updating number_of_replicas to [10] for indices [.scripts]
[2017-02-03 12:03:00,952][INFO ][cluster.metadata ] [ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com] [.scripts] auto expanded replicas to [10]
[2017-02-03 12:03:30,953][WARN ][discovery.zen.publish ] [ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com] timed out waiting for all nodes to process published state [737392] (timeout [30s], pending nodes: [{ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com}{XXXXXXXX}{XXX.XXX.XXX.XXX}{XXX.XXX.XXX.XXX:9300}{max_local_storage_nodes=1, aws_availability_zone=XXXXX, tag=current, master=false}])
[2017-02-03 12:03:31,060][WARN ][cluster.service ] [ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com] cluster state update task [update-settings] took 30.1s above the warn threshold of 30s
I tried restarting the new data node, closing/re-opening the .scripts
index, nothing worked.
How do I get out of this state?
Thanks in advance
BP