My ES 5.6.3 cluster is yellow:
administrator@srv4-sv:~$ curl -XGET 'localhost:9200/_cluster/health?pretty'
{
"cluster_name" : "redacted",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 15,
"number_of_data_nodes" : 15,
"active_primary_shards" : 172,
"active_shards" : 350,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 99.71509971509973
}
administrator@srv4-sv:~$ curl -sS -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
stats_new 1 r UNASSIGNED ALLOCATION_FAILED
administrator@srv4-sv:~$ curl -sS -X POST 'localhost:9200/_cluster/allocation/explain?pretty'
{
"index" : "stats_new",
"shard" : 1,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2018-08-03T22:38:21.312Z",
"failed_allocation_attempts" : 5,
"details" : "failed recovery, failure RecoveryFailedException[[stats_new][1]: Recovery failed from {srv4-sv}{w_6a_qIfTN2L8BN0wGiCyQ}{0KxMMAQETZywBTVNtriTkA}{10.64.2.17}{10.64.2.17:9300} into {srv4-ch}{7iSlk_IzRleE3AQag6cAfA}{i5S74TwfQeK
lih5okbSjvQ}{10.64.3.17}{10.64.3.17:9300}]; nested: RemoteTransportException[[srv4-sv][10.64.2.17:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to
transfer [0] files with total size of [0b]]; nested: IllegalStateException[try to recover [stats_new][1] from primary shard with sync id but number of docs differ: 29983940 (srv4-sv, primary) vs 29983938(srv4-ch)]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [ ... ]
}
According to the Internet, the fix is this:
administrator@srv4-sv:~$ curl -sS -XPOST 'localhost:9200/_cluster/reroute' -d '{"commands":[{"allocate_stale_primary":{"index":"stats_new","shard":1,"node":"srv4-sv","accept_data_loss":true}}]}'
but that doesn't work:
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[srv6-sv][10.64.2.21:9300][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate_stale_primary] primary [stats_new][1] is already assigned"},"status":400}
Thoughts?