ENV
ElasticSearch Version: 7.8.0
Description:
We run a cluster with three node, Not long ago, the file system of one node was damaged and forced to go offline. Then the cluster ran with two nodes for a period of time, and the cluster status has always been green,But recently, we encountered the following errors when writing data:
{"_index":"index_xxx","_type":"_doc","_id":"xxx","status":404,"error":{"type":"shard_not_found_exception","reason":"no such shard","index_uuid":"taBxiWo6RhWWaRG3Ainy4A","shard":"0","index":"index_xxx"}}
After checking, we can see that the cluster status is still green, and there has no error in elasticsearch server log 。
Then, we try to add a new node to the cluster. When the new node is successfully added to the cluster, we see the following error messages in the server log of the master node:
[2022-07-15T19:18:16,727][WARN ][o.e.c.r.a.AllocationService] [node_ip] failing shard [failed shard, shard [index_xxx][1], node[6U4Yf_FYRJqnw06IWG82DQ], [P], s[STARTED], a[id=ScXTxC5PSDOVxgi78PJQwg], message [master {master_node}{RizeqIPITtu_fmtHRo-xhQ}{pjyrxWldTdWh8SwhOLNQ4Q}{master_node}{master_node:tcp_port}{dilmrt}{ml.machine_memory=134608896000, ml.max_open_jobs=20, xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2022-07-15T19:18:16,736][INFO ][o.e.c.r.a.AllocationService] [master_node] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[index_xxx][1], [index_xxx][0], [index_xxx][4], ... [12 items in total]]]).
Next, we see that the cluster state changes from green to yellow, and some shards are in the repair state。
Question
Is there already some problem of those shards befor we add new nodes? If so, why does the cluster status always show [Green] ?
Please help, thanks !