This weekend I attempted to update my cluster from 0.19.8 to 0.19.11 but
ran in to issues at the very end when updating the master server.
The issues I am seeing is that any new index created will not be assigned
to any servers. Any degraded index will not attempt to recovery and will
stay in that state even once the restarted server comes back up.
I did attempt a full cluster restart and flush it did not seem to help. I
do have some custom allocations in place, but I would not expect them to
cause problems.
I was able to temporarily resolve the issue by downgrading the master
servers to 0.19.8 (later upgraded to 0.19.9 successfully) while keeping the
storage nodes as 0.19.11.
Just a bit of additional information I have nodes broken up with three
different tags: frontend, storage and backup. By default the new indexes
are assigned to the frontend nodes.
_template/default:
{
"default" : {
"template" : "*",
"order" : 0,
"settings" : {
"index.compress" : "true",
"index.routing.allocation.exclude.tag" : "backup,storage",
"index.number_of_shards" : "4",
"index.routing.allocation.total_shards_per_node" : "4"
},
"mappings" : { }
}
}
_cluster/settings:
{
"persistent" : {
"indices.recovery.max_size_per_sec" : "50mb",
"cluster.routing.allocation.exclude.tag" : ""
},
"transient" : { }
}
When I upgrade the master servers I do not see anything strange in the
logs, everything comes up properly.
As soon as I restart one of the data nodes the shards will remain
unassigned, there are no messages in the cluster log file that shows
anything going on other then nodes joining and leaving.
When a master running 0.19.9 takes over the shards immediately start
allocating properly.
I looked at the change log for 0.19.10 but I do not see anything that jumps
out being a probable cause, does anyone have any ideas on what may be
happening?
Also, I have attempted to force the index to use the frontend servers using
index.routing.allocation.include.tag but it did not make a difference.
--