Resolve one of the index in cluster is with Yellow status

We are running HA Elasticsearch in k8s. Recently we found one of the indexes in the Yellow status due to Unassigned shards which leads the whole cluster state to yellow.

Shard of that index status in NODE-A is STARTED and NODE-B is UNASSIGNED.

We checked more using the following API _cluster/allocation/explain

There's some details found:

shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], 

CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [XXX/YYYmb], which is larger than the limit of [XXX/YYYmb], real usage: [XXX/YYYmb], new bytes reserved: [XXX/YYYmb], usages [request=0/0b, fielddata=XXX/YYYmb, in_flight_requests=XXX/YYYmb, accounting=XXX/YYYmb]]; ], allocation_status[no_attempt]], expected_shard_size[XXXXXXX],

Also, I have set up another HA ES cluster with the same configuration to reproduce this using snapshot data of the cluster. however, the new cluster is green and no shard allocation is failed.

If anyone can suggest the approach to assign the shard of that index and make the cluster state to green again. Thanks.

Welcome to our community! :smiley:

You cannot have a node that is yellow, either the cluster is yellow or it's green/red.

Did you do that?

Thanks @warkolm I have not tried that API. Need to make sure if there is no downtime or excessive use of the resources after calling the API. (That's the shard state is UNASSIGNED for that index, updated the question, thank you!)
Is there any chance of downtime or data loss using reroute?

You can lose data if you force a reroute of an unallocated primary, which you are not doing here. You should be pretty safe.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.