Search rejected due to missing shards [[tasklist-flownode-instance-1.0.0_][0]]. Consider using `allow_partial_search_results` setting to bypass this error

mlngupta · August 21, 2024, 5:46am

Hi Team,

We have Elasticsearch running with 7 node and version 7.17.1 running on EKS pods.

Recently we noticed that the issue with one of the node due to PVC got issue because of EFS_CSI driver. Once issue fixed PVC recreated.

After that we got issue with ES nodes with below error.

at java.lang.Thread.run(Thread.java:1623) [?:?]",
"Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Search rejected due to missing shards [[tasklist-flownode-instance-1.0.0_][0]]. Consider using allow_partial_search_results setting to bypass this error.",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:227) ~[elasticsearch-7.17.10.jar:7.17.10]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:454) [elasticsearch-7.17.10.jar:7.17.10]",
"... 68 more"] }
{"type": "server", "timestamp": "2024-08-21T05:14:40,041Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-6", "message": "path: /tasklist-flownode-instance-1.0.0_/delete_by_query, params: {slices=auto, requests_per_second=-1, scroll=30000ms, conflicts=proceed, index=tasklist-flownode-instance-1.0.0, wait_for_completion=true, timeout=1m}", "cluster.uuid":

Need your urgent help, because our production was struck because of the error.

Thank you in advance for your support.

Regards,
Lakshmi Narayana

Christian_Dahlqvist · August 21, 2024, 5:53am

Note that this is a community forum, so there are no SLAs or even guaranntees of getting a response.

Are you using EFS for node storage? If so, please not that this is neither recommended nor supported and might be the reason you are seeing shard issues. I would recommend switching to EBS backed storage.

mlngupta · August 21, 2024, 5:54am

{"status":"red","active_primary_shards":5772,"active_shards":5826,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":908,"delayed_unassigned_shards":0}
For all ES nodes has same status.

mlngupta · August 21, 2024, 6:01am

Hi Christian,

Thank you for your quick response, we have EBS storage using.

Regards,
Lakshmi Narayana

Christian_Dahlqvist · August 21, 2024, 6:11am

Where does the EFS driver come in then?

Are all indices configured with a replica shard or do you have indices with only primary shards?

mlngupta · August 21, 2024, 6:15am

Hi Christian,

When we noticed the pod status, and checked PVC was found issue due to the drivers, after fixing the issue we were able to get back the PVC.

"index.allocation.existing_shards_allocator": "gateway_allocator", "index.number_of_replicas": "0",
"index.auto_expand_replicas": "false",

Thank you.

Regards,
Lakshmi Narayana

Christian_Dahlqvist · August 21, 2024, 6:26am

I still do not understand why there would be an issue with the EFS_CSI driver if you were using EBS storage. Could you please elaborate and describe what happened in greater detail?

If you do not have any replicas configured you may not be able to recover the missing shards if there are issues with your storage. You may therefore need to restore indices from a snapshot (assuming you have one). If you do not have a snapshot you may need to allocate empty primary shards to replace the missing ones and accept the data in these shards being lost.

mlngupta · August 21, 2024, 6:42am

Hi Christian,

Thank you for your suggestions. But I don't have much knowledge on ES. Hence, request your help to fix it. Because I have no way to fix it.

Maybe we have issue with EBS CSI driver we fixed.

Thank you.

Regards,
Lakshmi Narayana

Christian_Dahlqvist · August 21, 2024, 6:52am

You initially mentioned EFS_CLI driver, which is related to EFS and not EBS. The distinction between EFS and EBS (which are very different) is important. You need to provide details about how your cluster is set up and confirm what type of storage that is actually used. If you do not have any replicas configured you most likely need to use the API I linked to in my last post to force allocation of empty primary shards and accept the data lossThat should bring your cluster into a better (green) state.

Once you have done that I think you need to think about whether you require more resiliency and enable replicas if that is the case. If you indeed are using EFS for storage you should also change that.

mlngupta · August 21, 2024, 7:05am

Hi Christian,

Thank you for your reply,

I have tried below in API,

POST /_cluster/reroute?metric=none
{
"commands": [
{
"move": {
"index": "zeebe-record_message_8.1.14_2024-08-09", "shard": 0,
"from_node": "elasticsearch-master-0", "to_node": "elasticsearch-master-6"
}
},
{
"allocate_replica": {
"index": "zeebe-record_message_8.1.14_2024-08-09", "shard": 1,
"node": "elasticsearch-master-2"
}
}
]
}

Got the response:
{"Message":"Your request: '/_cluster/reroute' is not allowed."}

Thank you,

Regards,
Lakshmi Narayana

mlngupta · August 21, 2024, 7:12am

mlngupta · August 21, 2024, 7:14am

Hi Christian,

Can you please check and help us to get data to es6 data. Thank you.

Regards,
Lakshmi Narayana

Christian_Dahlqvist · August 21, 2024, 7:54am

I believe you will need to use the allocate_empty_primary parameter, which will create a new empty shard.

mlngupta · August 21, 2024, 8:26am

Hi Christian,

Thank you for the update, we will try the solution.

Regards,
Lakshmi Narayana

mlngupta · August 21, 2024, 8:57am

{
"note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
"index": "operate-operation-8.1.0_2024-02-08",
"shard": 0,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "NODE_LEFT",
"at": "2024-08-16T05:20:10.956Z",
"details": "node_left [rxnWlou0Sn2BXfkxbawbJw]",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions": [{
"node_id": "GDHtTsp7QeOp0ydu0DOvHA",
"node_name": "elasticsearch-master-5",
"transport_address": "10.1.235.33:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "KTUIBayGTNqX1k5je88VqA",
"node_name": "elasticsearch-master-2",
"transport_address": "10.1.224.236:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "Rc1CXRD3QI-2PLDbIHTwSw",
"node_name": "elasticsearch-master-4",
"transport_address": "10.1.224.55:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "koOlCHC5StacvAsSXWjvsQ",
"node_name": "elasticsearch-master-1",
"transport_address": "10.1.227.85:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "pHDQP1NaTcGDU95uv3HEAA",
"node_name": "elasticsearch-master-0",
"transport_address": "10.1.192.108:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "ph1U6RxASAyKE-gCiceXYA",
"node_name": "elasticsearch-master-6",
"transport_address": "10.1.194.42:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}, {
"node_id": "rzUFNbqrR0mfHbLj8SJgMQ",
"node_name": "elasticsearch-master-3",
"transport_address": "10.1.234.33:9300",
"node_attributes": {
"ml.machine_memory": "9126805504",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "4294967296",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}]
}

system · September 18, 2024, 8:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch failed Search rejected due to missing shards [[.kibana_task_manager_7.17.7_001][0]] Elasticsearch	8	7162	September 6, 2023
Reindexing fails because missing shards Elasticsearch reindex	1	939	February 20, 2023
Issue with elasticsearch shards: [search_phase_execution_exception] all shards failed Elasticsearch	1	421	March 20, 2024
Data query issues with elasticsearch 6.8 Elasticsearch docker	5	1765	July 21, 2020
Two different shard exceptions Elasticsearch	16	545	July 6, 2017

Search rejected due to missing shards [[tasklist-flownode-instance-1.0.0_][0]]. Consider using `allow_partial_search_results` setting to bypass this error

Related topics