Search rejected due to missing shards [[tasklist-flownode-instance-1.0.0_][0]]. Consider using `allow_partial_search_results` setting to bypass this error

Hi Team,

We have Elasticsearch running with 7 node and version 7.17.1 running on EKS pods.

Recently we noticed that the issue with one of the node due to PVC got issue because of EFS_CSI driver. Once issue fixed PVC recreated.

After that we got issue with ES nodes with below error.

at java.lang.Thread.run(Thread.java:1623) [?:?]",
"Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Search rejected due to missing shards [[tasklist-flownode-instance-1.0.0_][0]]. Consider using allow_partial_search_results setting to bypass this error.",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:227) ~[elasticsearch-7.17.10.jar:7.17.10]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:454) [elasticsearch-7.17.10.jar:7.17.10]",
"... 68 more"] }
{"type": "server", "timestamp": "2024-08-21T05:14:40,041Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-6", "message": "path: /tasklist-flownode-instance-1.0.0_/delete_by_query, params: {slices=auto, requests_per_second=-1, scroll=30000ms, conflicts=proceed, index=tasklist-flownode-instance-1.0.0, wait_for_completion=true, timeout=1m}", "cluster.uuid":

Need your urgent help, because our production was struck because of the error.

Thank you in advance for your support.

Regards,
Lakshmi Narayana

Note that this is a community forum, so there are no SLAs or even guaranntees of getting a response.

Are you using EFS for node storage? If so, please not that this is neither recommended nor supported and might be the reason you are seeing shard issues. I would recommend switching to EBS backed storage.

{"status":"red","active_primary_shards":5772,"active_shards":5826,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":908,"delayed_unassigned_shards":0}
For all ES nodes has same status.

Hi Christian,

Thank you for your quick response, we have EBS storage using.

Regards,
Lakshmi Narayana

Where does the EFS driver come in then?

Are all indices configured with a replica shard or do you have indices with only primary shards?

Hi Christian,

When we noticed the pod status, and checked PVC was found issue due to the drivers, after fixing the issue we were able to get back the PVC.

"index.allocation.existing_shards_allocator": "gateway_allocator", "index.number_of_replicas": "0",
"index.auto_expand_replicas": "false",

Thank you.

Regards,
Lakshmi Narayana

I still do not understand why there would be an issue with the EFS_CSI driver if you were using EBS storage. Could you please elaborate and describe what happened in greater detail?

If you do not have any replicas configured you may not be able to recover the missing shards if there are issues with your storage. You may therefore need to restore indices from a snapshot (assuming you have one). If you do not have a snapshot you may need to allocate empty primary shards to replace the missing ones and accept the data in these shards being lost.

Hi Christian,

Thank you for your suggestions. But I don't have much knowledge on ES. Hence, request your help to fix it. Because I have no way to fix it.

Maybe we have issue with EBS CSI driver we fixed.

Thank you.

Regards,
Lakshmi Narayana

You initially mentioned EFS_CLI driver, which is related to EFS and not EBS. The distinction between EFS and EBS (which are very different) is important. You need to provide details about how your cluster is set up and confirm what type of storage that is actually used. If you do not have any replicas configured you most likely need to use the API I linked to in my last post to force allocation of empty primary shards and accept the data lossThat should bring your cluster into a better (green) state.

Once you have done that I think you need to think about whether you require more resiliency and enable replicas if that is the case. If you indeed are using EFS for storage you should also change that.

Hi Christian,

Thank you for your reply,

I have tried below in API,

POST /_cluster/reroute?metric=none
{
"commands": [
{
"move": {
"index": "zeebe-record_message_8.1.14_2024-08-09", "shard": 0,
"from_node": "elasticsearch-master-0", "to_node": "elasticsearch-master-6"
}
},
{
"allocate_replica": {
"index": "zeebe-record_message_8.1.14_2024-08-09", "shard": 1,
"node": "elasticsearch-master-2"
}
}
]
}

Got the response:
{"Message":"Your request: '/_cluster/reroute' is not allowed."}

Thank you,

Regards,
Lakshmi Narayana

Hi Christian,

Can you please check and help us to get data to es6 data. Thank you.

Regards,
Lakshmi Narayana

I believe you will need to use the allocate_empty_primary parameter, which will create a new empty shard.

Hi Christian,

Thank you for the update, we will try the solution.

Regards,
Lakshmi Narayana

{
"note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
"index": "operate-operation-8.1.0_2024-02-08",
"shard": 0,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "NODE_LEFT",
"at": "2024-08-16T05:20:10.956Z",
"details": "node_left [rxnWlou0Sn2BXfkxbawbJw]",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions": [{
"node_id": "GDHtTsp7QeOp0ydu0DOvHA",
"node_name": "elasticsearch-master-5",
"transport_address": "10.1.235.33:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "KTUIBayGTNqX1k5je88VqA",
"node_name": "elasticsearch-master-2",
"transport_address": "10.1.224.236:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "Rc1CXRD3QI-2PLDbIHTwSw",
"node_name": "elasticsearch-master-4",
"transport_address": "10.1.224.55:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "koOlCHC5StacvAsSXWjvsQ",
"node_name": "elasticsearch-master-1",
"transport_address": "10.1.227.85:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "pHDQP1NaTcGDU95uv3HEAA",
"node_name": "elasticsearch-master-0",
"transport_address": "10.1.192.108:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "ph1U6RxASAyKE-gCiceXYA",
"node_name": "elasticsearch-master-6",
"transport_address": "10.1.194.42:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "rzUFNbqrR0mfHbLj8SJgMQ",
"node_name": "elasticsearch-master-3",
"transport_address": "10.1.234.33:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}]
}