Hi guys!
I have a question regarding how delayed_unassigned_shards works.
Our ES cluster has
- 5 datanodes, 5 master nodes, and 5 client nodes
- we have this setting set to 5m for all our indices
"index.unassigned.node_left.delayed_timeout": "5m" - we use our cluster for log searching, so new indices are created every day.
- all nodes have version 2.3.3
We noticed that every time a datanode is rebooted (e.g. in a deployment), even if it came back to join the cluster within 5 minutes, it still took a long time (2-4 hours) for the unassigned shards (~3k) to be assigned and they don't seem to be re-allocated back to the datanode that got rebooted but instead doing the whole rebalancing thing across clusters.
These are the logs from the master node when I restarted the datanode.
So we did see that the shards are delayed but even after the node (team-02001.node-data) joined back to the cluster, the timer is still counting and the shards are not being assigned back to the node immediately
[2016-11-08 18:46:52,445][INFO ][cluster.routing ] [team-02001.node-master] delaying allocation for [3532] unassigned shards, next check in [1m]
[2016-11-08 18:47:34,438][INFO ][cluster.service ] [team-02001.node-master] added {{team-02001.node-data}{fgcYfc4sR5aAUI1D9xjktw}{172.17.0.3}{172.17.0.3:9382}{ad=ad2, host=team-02001.node, max_local_storage_nodes=1, master=false},}, reason: zen-disco-join(join from node[{team-02001.node-data}{fgcYfc4sR5aAUI1D9xjktw}{172.17.0.3}{172.17.0.3:9382}{ad=ad2, host=team-02001.node, max_local_storage_nodes=1, master=false}])
[2016-11-08 18:47:54,132][INFO ][cluster.routing ] [team-02001.node-master] delaying allocation for [3447] unassigned shards, next check in [3.9m]
======== cluster health during time of delay (5 minute) =========
[chihccha@lumberjack-02001 logs]$ curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "team.ad2.r2",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 30,
"number_of_data_nodes" : 5,
"active_primary_shards" : 8830,
"active_shards" : 14222,
"relocating_shards" : 0,
"initializing_shards" : 7,
"unassigned_shards" : 3431,
"delayed_unassigned_shards" : 3364,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 80.53227633069082
}
======== cluster health after the delay time (5 minute) elaspsed =========
[chihccha@lumberjack-02001 logs]$ curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "team.ad2.r2",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 30,
"number_of_data_nodes" : 5,
"active_primary_shards" : 8830,
"active_shards" : 14227,
"relocating_shards" : 0,
"initializing_shards" : 6,
"unassigned_shards" : 3427,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 1,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 80.56058890147226
}
Using the "_cat/recovery" command, we are seeing shards being moved between different nodes across the entire cluster (from looking at the source and target host for shards).
index shard time type stage source_host target_host repository snapshot files files_percent bytes bytes_percent total_files total_bytes translog translog_percent total_translog
teamA-2016-11-08 4 4263014 replica translog n/a n/a 286 100.0% 80163560685 100.0% 286 80163560685 2107461 20.8% 10124584
Wanted to ask if this is a normal behavior?
From the documentation "https://www.elastic.co/guide/en/elasticsearch/reference/2.3/delayed-allocation.html", it sounds like the setting "index.unassigned.node_left.delayed_timeout" is supposed to help with this shard-shuffling problem.
Please let us know if there is any additional information we can provide.
Thanks,
Chelsey