Master has not removed previously failed shard. resending shard failure

ENV

ElasticSearch Version: 7.8.0

Description:

We run a cluster with three node, Not long ago, the file system of one node was damaged and forced to go offline. Then the cluster ran with two nodes for a period of time, and the cluster status has always been green,But recently, we encountered the following errors when writing data:

{"_index":"index_xxx","_type":"_doc","_id":"xxx","status":404,"error":{"type":"shard_not_found_exception","reason":"no such shard","index_uuid":"taBxiWo6RhWWaRG3Ainy4A","shard":"0","index":"index_xxx"}}

After checking, we can see that the cluster status is still green, and there has no error in elasticsearch server log 。

Then, we try to add a new node to the cluster. When the new node is successfully added to the cluster, we see the following error messages in the server log of the master node:

[2022-07-15T19:18:16,727][WARN ][o.e.c.r.a.AllocationService] [node_ip] failing shard [failed shard, shard [index_xxx][1], node[6U4Yf_FYRJqnw06IWG82DQ], [P], s[STARTED], a[id=ScXTxC5PSDOVxgi78PJQwg], message [master {master_node}{RizeqIPITtu_fmtHRo-xhQ}{pjyrxWldTdWh8SwhOLNQ4Q}{master_node}{master_node:tcp_port}{dilmrt}{ml.machine_memory=134608896000, ml.max_open_jobs=20, xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2022-07-15T19:18:16,736][INFO ][o.e.c.r.a.AllocationService] [master_node] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[index_xxx][1], [index_xxx][0], [index_xxx][4], ... [12 items in total]]]).

Next, we see that the cluster state changes from green to yellow, and some shards are in the repair state。

Question

Is there already some problem of those shards befor we add new nodes? If so, why does the cluster status always show [Green] ?

Please help, thanks !

Welcome to our community! :smiley:

7.8 has been EOL for some time and is unsupported, are you able to upgrade to a supported version?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.