Elastic search error in application, Error message - " An error occurred while requesting elastic search for data on index 'denmod_y_*'

Joseph_Raj · July 12, 2019, 2:51pm

Once the application runs & When Elasticsearch for data, it was unable to retrieve. It says error while searching the data. It has two types of error 1) All shards failed error 2) Index not found exception.

We have rebooted all the master & Elasticsearch servers but we are still unable to find the solution. Also im new to Elasticsearch.

Index not found exception

{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index",
        "resource.type": "index_or_alias",
        "resource.id": "denmod_y_104242",
        "index_uuid": "_na_",
        "index": "denmod_y_104242"
      }
    ],
    "type": "index_not_found_exception",
    "reason": "no such index",
    "resource.type": "index_or_alias",
    "resource.id": "denmod_y_104242",
    "index_uuid": "_na_",
    "index": "denmod_y_104242"
  },
  "status": 404
}

{
  "error": {
    "root_cause": [

    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [

    ],
    "caused_by": {
      "type": "no_shard_available_action_exception",
      "reason": null,
      "index_uuid": "Qtybg8VFStaOKuDkwJ00RA",
      "shard": "5",
      "index": "en_y_77"
    }
  },
  "status": 503
}

For first error, we need the shards to be allocated or assigned 2) For second error, index error to be rectified.

dadoonet · July 12, 2019, 3:04pm

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v
# Optionally 
GET /_cat/shards?v

If some outputs are too big, please share them on gist.github.com and link them here.

Joseph_Raj · July 15, 2019, 7:51am

Please find the below output:

GET /_cat/nodes?v

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
168.124.25.122 19 67 0 di - elk-denmod-6
168.124.54.142 26 68 0 di - elk-denmod-5
168.124.29.126 25 68 0 di - elk-denmod-3
168.124.170.244 25 68 0 di - elk-denmod-4
168.124.25.140 13 66 0 di - elk-denmod-1
168.124.147.161 36 70 24 m * elk-denmod-web
168.124.29.129 24 67 1 di - elk-denmod-2

GET /_cat/health?v

epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1563176654 09:44:14 elk-denmod red 7 6 20 10 0 0 596 0 - 3.2%

dadoonet · July 15, 2019, 9:53am

Please read this about how to format.

Could you also provide the other outputs I asked for?

Joseph_Raj · July 15, 2019, 10:00am

We don't have kibana to give the format like that, we use only cerebro tool elastic search.

dadoonet · July 15, 2019, 11:57am

And? Does it prevent you from running the queries I asked and format the result?

Joseph_Raj · July 17, 2019, 7:27am

please find the below github link for the above outputs.

GitHub_Link

dadoonet · July 17, 2019, 8:06am

We can see that a lot of shards are unassigned. Which explains this red state and why you can't really search.

You have apparently 596 missing shards.

Could you run:

GET /_cluster/pending_tasks
GET /_cluster/allocation/explain?include_disk_info=true

Joseph_Raj · July 17, 2019, 9:50am

GET /_cluster/pending_tasks --> No tasks

GET /_cluster/allocation/explain?include_disk_info=true

gist.github.com

https://gist.github.com/josephraj08/7f3da8bd987841c60ea634417b75ba71

gistfile1.txt

{"index":"en_y_77","shard":3,"primary":true,"current_state":"unassigned","unassigned_info":{"reason":"CLUSTER_RECOVERED","at":"2019-07-11T15:02:24.766Z","last_allocation_status":"no_valid_shard_copy"},"cluster_info":{"nodes":{"fU5vCMTTS46Pboz5zMXD0Q":{"node_name":"elk-denmod-6","least_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":858990309376,"used_bytes":543820062720,"free_bytes":315170246656,"free_disk_percent":36.7,"used_disk_percent":63.3},"most_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":858990309376,"used_bytes":543820062720,"free_bytes":315170246656,"free_disk_percent":36.7,"used_disk_percent":63.3}},"yez29U5iQxWl2hh6Yh-_xg":{"node_name":"elk-denmod-5","least_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":590416351232,"free_bytes":161199775744,"free_disk_percent":21.4,"used_disk_percent":78.6},"most_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":590416351232,"free_bytes":161199775744,"free_disk_percent":21.4,"used_disk_percent":78.6}},"rS_6QTYGSiKTyXJnjvSdyw":{"node_name":"elk-denmod-4","least_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":512465907712,"free_bytes":239150219264,"free_disk_percent":31.8,"used_disk_percent":68.2},"most_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":512465907712,"free_bytes":239150219264,"free_disk_percent":31.8,"used_disk_percent":68.2}},"VnvJTaKIQIqrdzXSlPUxxQ":{"node_name":"elk-denmod-2","least_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":467434852352,"free_bytes":284181274624,"free_disk_percent":37.8,"used_disk_percent":62.2},"most_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":467434852352,"free_bytes":284181274624,"free_disk_percent":37.8,"used_disk_percent":62.2}},"5w1hFgnvRlOxtQ0QrX3tKQ":{"node_name":"elk-denmod-3","least_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":554406334464,"free_bytes":197209792512,"free_disk_percent":26.2,"used_disk_percent":73.8},"most_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":554406334464,"free_bytes":197209792512,"free_disk_percent":26.2,"used_disk_percent":73.8}},"t7JmFe-iSn63CALYD-0lxw":{"node_name":"elk-denmod-1","least_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":540357853184,"free_bytes":211258273792,"free_disk_percent":28.1,"used_disk_percent":71.9},"most_available":{"path":"D:\\ElkData\\nodes\\0","total_bytes":751616126976,"used_bytes":540357853184,"free_bytes":211258273792,"free_disk_percent":28.1,"used_disk_percent":71.9}}},"shard_sizes":{"[en_y_85][1][p]_bytes":14128725622,"[en_y_85][2][p]_bytes":14098013452,"[en_m_85][1][r]_bytes":20856338526,"[en_y_85][0][p]_bytes":14081779348,"[en_y_85][3][p]_bytes":14109302290,"[en_m_85][0][r]_bytes":20905128320,"[en_m_85][1][p]_bytes":20856476174,"[en_m_85][4][p]_bytes":20933522789,"[en_y_85][2][r]_bytes":14097941870,"[en_y_85][4][p]_bytes":14089412577,"[en_m_85][2][p]_bytes":20862099617,"[en_m_85][4][r]_bytes":20933494989,"[en_m_85][3][p]_bytes":20864730377,"[en_y_85][1][r]_bytes":14128645759,"[en_m_85][0][p]_bytes":20905128320,"[en_y_85][4][r]_bytes":14089965517,"[en_y_85][0][r]_bytes":14082105058,"[en_m_85][3][r]_bytes":20864702676,"[en_m_85][2][r]_bytes":20862099648,"[en_y_85][3][r]_bytes":14100146741},"shard_paths":{"[en_y_85][2], node[5w1hFgnvRlOxtQ0QrX3tKQ], [R], s[STARTED], a[id=nVOodMoRR_qeMxYScWcLBw]":"D:\\ElkData\\nodes\\0","[en_m_85][4], node[rS_6QTYGSiKTyXJnjvSdyw], [P], s[STARTED], a[id=ROhQOhKnRTSgYYhYtpe7VA]":"D:\\ElkData\\nodes\\0","[en_m_85][1], node[yez29U5iQxWl2hh6Yh-_xg], [P], s[STARTED], a[id=1BqGClVGStKtr3EtpbjAXQ]":"D:\\ElkData\\nodes\\0","[en_y_85][0], node[5w1hFgnvRlOxtQ0QrX3tKQ], [P], s[STARTED], a[id=Hew4YhlIS_yhMi462Ebnxw]":"D:\\ElkData\\nodes\\0","[en_m_85][1], node[5w1hFgnvRlOxtQ0QrX3tKQ], [R], s[STARTED], a[id=aoIYevgGSfCluu6aZnblWw]":"D:\\ElkData\\nodes\\0","[en_m_85][0], node[VnvJTaKIQIqrdzXSlPUxxQ], [P], s[STARTED], a[id=UsDJoySbT0i_G3yQzLILyQ]":"D:\\ElkData\\nodes\\0","[en_y_85][1], node[yez29U5iQxWl2hh6Yh-_xg], [P], s[STARTED], a[id=Jf_QxJIxQzGicB6BpEDf5g]":"D:\\ElkData\\nodes\\0","[en_y_85][0], node[rS_6QTYGSiKTyXJnjvSdyw], [R], s[STARTED], a[id=9UIaYjl5Tpee04VkRlG86g]":"D:\\ElkData\\nodes\\0","[en_m_85][2], node[fU5vCMTTS46Pboz5zMXD0Q], [R], s[STARTED], a[id=6aBielucQ5awujj4-ZCMQg]":"D:\\ElkData\\nodes\\0","[en_y_85][1], node[VnvJTaKIQIqrdzXSlPUxxQ], [R], s[STARTED], a[id=WNws0TxLTAOsVWyI0ym6BQ]":"D:\\ElkData\\nodes\\0","[en_m_85][3], node[rS_6QTYGSiKTyXJnjvSdyw], [P], s[STARTED], a[id=RyhRREOWQZmHu52oFkZXOQ]":"D:\\ElkData\\nodes\\0","[en_y_85][3], node[t7JmFe-iSn63CALYD-0lxw], [R], s[STARTED], a[id=Kl6nqKMHSCe_wLHmp0KWKQ]":"D:\\ElkData\\nodes\\0","[en_m_85][3], node[fU5vCMTTS46Pboz5zMXD0Q], [R], s[STARTED], a[id=iO64bMWzQSefeZT9o-svGQ]":"D:\\ElkData\\nodes\\0","[en_y_85][3], node[yez29U5iQxWl2hh6Yh-_xg], [P], s[STARTED], a[id=-gw5nml2S4iXEPicUrnLnw]":"D:\\ElkData\\nodes\\0","[en_y_85][4], node[fU5vCMTTS46Pboz5zMXD0Q], [P], s[STARTED], a[id=yo3YD_fSRoqJMPjOqG_JbQ]":"D:\\ElkData\\nodes\\0","[en_m_85][4], node[5w1hFgnvRlOxtQ0QrX3tKQ], [R], s[STARTED], a[id=cjLUHIBrSEqmR7jNB2jzlQ]":"D:\\ElkData\\nodes\\0","[en_m_85][2], node[yez29U5iQxWl2hh6Yh-_xg], [P], s[STARTED], a[id=YO-ZVPoyR7G8hgmK5Wg0Eg]":"D:\\ElkData\\nodes\\0","[en_y_85][2], node[rS_6QTYGSiKTyXJnjvSdyw], [P], s[STARTED], a[id=tI-mNycrRj-b03Hm5tL88A]":"D:\\ElkData\\nodes\\0","[en_m_85][0], node[t7JmFe-iSn63CALYD-0lxw], [R], s[STARTED], a[id=-HQT2ExfQbKcpSI4lz8ufA]":"D:\\ElkData\\nodes\\0","[en_y_85][4], node[VnvJTaKIQIqrdzXSlPUxxQ], [R], s[STARTED], a[id=ylU033OZTai5Yrqy6PMoZQ]":"D:\\ElkData\\nodes\\0"}},"can_allocate":"no_valid_shard_copy","allocate_explanation":"cannot allocate because all found copies of the shard are either stale or corrupt","node_allocation_decisions":[{"node_id":"5w1hFgnvRlOxtQ0QrX3tKQ","node_name":"elk-denmod-3","transport_address":"168.124.29.126:9300","node_decision":"no","store":{"found":false}},{"node_id":"VnvJTaKIQIqrdzXSlPUxxQ","node_name":"elk-denmod-2","transport_address":"168.124.29.129:9300","node_decision":"no","store":{"in_sync":false,"allocation_id":"NFORvHBlRXqPB5fxGK9lRw"}},{"node_id":"fU5vCMTTS46Pboz5zMXD0Q","node_name":"elk-denmod-6","transport_address":"168.124.25.122:9300","node_decision":"no","store":{"found":false}},{"node_id":"rS_6QTYGSiKTyXJnjvSdyw","node_name":"elk-denmod-4","transport_address":"168.124.170.244:9300","node_decision":"no","store":{"found":false}},{"node_id":"t7JmFe-iSn63CALYD-0lxw","node_name":"elk-denmod-1","transport_address":"168.124.25.140:9300","node_decision":"no","store":{"in_sync":false,"allocation_id":"X4TurCatRjaBFRnE6Zj8Dw"}},{"node_id":"yez29U5iQxWl2hh6Yh-_xg","node_name":"elk-denmod-5","transport_address":"168.124.54.142:9300","node_decision":"no","store":{"found":false}}]}

Joseph_Raj · July 26, 2019, 12:02pm

Can you please help us, as we are waiting for your reply.
How to clear the 596 unassigned shards.

xeraa · July 29, 2019, 12:54am

Are you just trying to clear the unassigned shards or recover the data (or whatever is left of it)? The error message is quite explicit what the problem is:

"can_allocate":"no_valid_shard_copy","allocate_explanation":"cannot allocate because all found copies of the shard are either stale or corrupt"

How did you get into that situation in the first place — did you roll back an upgrade after having some nodes up and running in the new version already?

You could look into cluster reroute and specifically the allocate_stale_primary section. This might cause data loss as described in the docs and you will need to set the accept_data_loss flag explicitly to acknowledge that. The docs don't have an example so that nobody blindly copies a destructive command; if you need help with the right query, ask again but only after being sure that this is where you want to go.

Joseph_Raj · July 29, 2019, 11:57am

We need to recover the data. Could you please provide us the query for getting the data.

xeraa · July 29, 2019, 12:45pm

This can cause data loss — use with caution!

The command should be (tested on 7.2):

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "accept_data_loss": true,
        "index": "<index_name>",
        "shard": "<shard_number>",
        "node": "<node_name>"
      }
    }
  ]
}

BTW you've avoided the question how you got to this problem, so I'm taking a bit of a wild guess here. There might be other solutions, but without knowing the background it's impossible to say.

Joseph_Raj · July 29, 2019, 1:00pm

Actually we don't want to lose data in this case, is there any possible way to recover the data. But in the above reply you said this can cause data loss.

How you got this problem ?
We also not sure for this query.

xeraa · July 29, 2019, 2:43pm

If you have taken a snapshot before I'd restore that. Otherwise I don't see many alternatives (especially without knowing what happened) and the query above might cause data loss.

Joseph_Raj · July 29, 2019, 3:05pm

Thanks for the query, we initially started with 3 index allocation & it was successful without any data loss. Also we could see the shards are getting reduced.

Joseph_Raj · July 29, 2019, 3:08pm

I'm not getting on this quote, "If you have taken snapshot before i'd restore that".
You meant to say before getting the data loss, i need to take some snapshots.
If so, what is the query i need to execute & then i can take snap for restore ?

Joseph_Raj · July 29, 2019, 3:10pm

We have also faced another error on some indices. Please check the below error :
cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster

Please help us on the above error to fix the same.

xeraa · July 29, 2019, 3:26pm

If it's important data, I would have hoped that you took regular snapshots and could restore those. Now it's too late.

cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster: Something is very wrong.

Did you lose 2 or more nodes of your cluster (with replica: 1 it would need to be at least 2)?
Did you delete the data directory (partially) on some nodes or were multiple drives corrupted?
Did something happen to your master node if there was a single one and it lost (part of) the cluster state?

Anyway, if you haven't taken a backup before the problems started: Try to recover what you can and make sure you have 3 master eligible nodes (with the correct settings for the quorum — depends on your version). Take snapshots in the future if the data should not be lost.

Joseph_Raj · August 1, 2019, 6:40am

Is there anyway to recover the data for the below error i shown.
cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster

If we can recover, please provide the query.
If not, why we can't recover. please explain. As i'm new to this elastic search.

Topic		Replies	Views
Elasticsearch issue Elasticsearch	13	2055	July 6, 2017
ES Ate My Shards/Indexes Elasticsearch	13	570	July 6, 2017
Disappearing Data and Unassigned Shards Elasticsearch	5	847	July 6, 2017
Unassigned primary and replica shards Elasticsearch	6	2115	July 6, 2017
UNASSIGNED indexes Elasticsearch	5	926	July 6, 2017

Elastic search error in application, Error message - " An error occurred while requesting elastic search for data on index 'denmod_y_*'

GET /_cat/nodes?v

GET /_cat/health?v

Related topics