Had to park this issue as we had other things, restarting this thread.
. We tested ES cluster stop and start again with the equal number of shards (204) in all the 7 nodes.
[root@elasticsearch2 share]# curl 'elasticsearch7:9200/_cat/allocation?v'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
204 56.9gb 64.5gb 83gb 147.5gb 43 10.0.0.17 10.0.0.17 wht1qYx
204 57.3gb 65.1gb 82.3gb 147.5gb 44 10.0.0.9 10.0.0.9 70eRJrE
204 57.5gb 65.1gb 82.3gb 147.5gb 44 10.0.0.12 10.0.0.12 zlHb2gq
204 57.8gb 65.4gb 82gb 147.5gb 44 10.0.0.15 10.0.0.15 idNEX8h
204 57.4gb 65.3gb 82.2gb 147.5gb 44 10.0.0.20 10.0.0.20 7sNEZsb
204 57.3gb 64.9gb 82.5gb 147.5gb 44 10.0.0.14 10.0.0.14 lURUCVH
204 57.4gb 65gb 82.4gb 147.5gb 44 10.0.0.8 10.0.0.8 Y0EJjkY
-
Below is the output after disabling the allocation. Number of shards is not even in 7 ES nodes and also we can see 714 shards unassigned.
[root@elasticsearch7 share]# curl 'elasticsearch7:9200/_cat/allocation?v'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
0 0b 65.5gb 81.9gb 147.5gb 44 10.0.0.17 10.0.0.17 7sNEZsb
155 49.5gb 64.5gb 83gb 147.5gb 43 10.0.0.9 10.0.0.9 wht1qYx
150 49.3gb 65.4gb 82gb 147.5gb 44 10.0.0.12 10.0.0.12 idNEX8h
62 5.4gb 65.1gb 82.3gb 147.5gb 44 10.0.0.15 10.0.0.15 zlHb2gq
155 54gb 64.9gb 82.5gb 147.5gb 44 10.0.0.13 10.0.0.13 lURUCVH
86 9.6gb 65.1gb 82.4gb 147.5gb 44 10.0.0.14 10.0.0.14 Y0EJjkY
106 32.8gb 65gb 82.4gb 147.5gb 44 10.0.0.8 10.0.0.8 70eRJrE
714 UNASSIGNED
[root@elasticsearch7 share]# curl -XGET 'elasticsearch2:9200/_cluster/health?pretty'
{
"cluster_name" : "elastic-search-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 7,
"number_of_data_nodes" : 7,
"active_primary_shards" : 714,
"active_shards" : 714,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 714,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}
-
Below is the output after enabling the allocation after restarting the nodes. We can see the relocation of shards happening to balance the number of shards in the cluster nodes..
[root@elasticsearch7 share]# curl -XGET 'elasticsearch2:9200/_cluster/health?pretty'
{
"cluster_name" : "elastic-search-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 7,
"number_of_data_nodes" : 7,
"active_primary_shards" : 714,
"active_shards" : 1428,
"relocating_shards" : 2,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
This is related to the original thread
The logs are attached on Ability to stop and start a cluster without shard movement
The original issue is Ability to stop and start a cluster without shard movement