Elastic search error in application, Error message - " An error occurred while requesting elastic search for data on index 'denmod_y_*'

Once the application runs & When Elasticsearch for data, it was unable to retrieve. It says error while searching the data. It has two types of error 1) All shards failed error 2) Index not found exception.

We have rebooted all the master & Elasticsearch servers but we are still unable to find the solution. Also im new to Elasticsearch.

  1. Index not found exception
{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index",
        "resource.type": "index_or_alias",
        "resource.id": "denmod_y_104242",
        "index_uuid": "_na_",
        "index": "denmod_y_104242"
      }
    ],
    "type": "index_not_found_exception",
    "reason": "no such index",
    "resource.type": "index_or_alias",
    "resource.id": "denmod_y_104242",
    "index_uuid": "_na_",
    "index": "denmod_y_104242"
  },
  "status": 404
}
{
  "error": {
    "root_cause": [

    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [

    ],
    "caused_by": {
      "type": "no_shard_available_action_exception",
      "reason": null,
      "index_uuid": "Qtybg8VFStaOKuDkwJ00RA",
      "shard": "5",
      "index": "en_y_77"
    }
  },
  "status": 503
}
  1. For first error, we need the shards to be allocated or assigned 2) For second error, index error to be rectified.

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v
# Optionally 
GET /_cat/shards?v

If some outputs are too big, please share them on gist.github.com and link them here.

Please find the below output:

GET /_cat/nodes?v

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
168.124.25.122 19 67 0 di - elk-denmod-6
168.124.54.142 26 68 0 di - elk-denmod-5
168.124.29.126 25 68 0 di - elk-denmod-3
168.124.170.244 25 68 0 di - elk-denmod-4
168.124.25.140 13 66 0 di - elk-denmod-1
168.124.147.161 36 70 24 m * elk-denmod-web
168.124.29.129 24 67 1 di - elk-denmod-2

GET /_cat/health?v

epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1563176654 09:44:14 elk-denmod red 7 6 20 10 0 0 596 0 - 3.2%

Please read this about how to format.

Could you also provide the other outputs I asked for?

We don't have kibana to give the format like that, we use only cerebro tool elastic search.

And? Does it prevent you from running the queries I asked and format the result?

please find the below github link for the above outputs.

GitHub_Link

We can see that a lot of shards are unassigned. Which explains this red state and why you can't really search.

You have apparently 596 missing shards.

Could you run:

GET /_cluster/pending_tasks
GET /_cluster/allocation/explain?include_disk_info=true

GET /_cluster/pending_tasks --> No tasks

GET /_cluster/allocation/explain?include_disk_info=true

Can you please help us, as we are waiting for your reply.
How to clear the 596 unassigned shards.

Are you just trying to clear the unassigned shards or recover the data (or whatever is left of it)? The error message is quite explicit what the problem is:

"can_allocate":"no_valid_shard_copy","allocate_explanation":"cannot allocate because all found copies of the shard are either stale or corrupt"

How did you get into that situation in the first place — did you roll back an upgrade after having some nodes up and running in the new version already?

You could look into cluster reroute and specifically the allocate_stale_primary section. This might cause data loss as described in the docs and you will need to set the accept_data_loss flag explicitly to acknowledge that. The docs don't have an example so that nobody blindly copies a destructive command; if you need help with the right query, ask again but only after being sure that this is where you want to go.

1 Like

We need to recover the data. Could you please provide us the query for getting the data.

This can cause data loss — use with caution!

The command should be (tested on 7.2):

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "accept_data_loss": true,
        "index": "<index_name>",
        "shard": "<shard_number>",
        "node": "<node_name>"
      }
    }
  ]
}

BTW you've avoided the question how you got to this problem, so I'm taking a bit of a wild guess here. There might be other solutions, but without knowing the background it's impossible to say.

Actually we don't want to lose data in this case, is there any possible way to recover the data. But in the above reply you said this can cause data loss.

How you got this problem ?
We also not sure for this query.

If you have taken a snapshot before I'd restore that. Otherwise I don't see many alternatives (especially without knowing what happened) and the query above might cause data loss.

Thanks for the query, we initially started with 3 index allocation & it was successful without any data loss. Also we could see the shards are getting reduced.

I'm not getting on this quote, "If you have taken snapshot before i'd restore that".
You meant to say before getting the data loss, i need to take some snapshots.
If so, what is the query i need to execute & then i can take snap for restore ?

We have also faced another error on some indices. Please check the below error :
cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster

Please help us on the above error to fix the same.

If it's important data, I would have hoped that you took regular snapshots and could restore those. Now it's too late.

cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster: Something is very wrong.

  • Did you lose 2 or more nodes of your cluster (with replica: 1 it would need to be at least 2)?
  • Did you delete the data directory (partially) on some nodes or were multiple drives corrupted?
  • Did something happen to your master node if there was a single one and it lost (part of) the cluster state?

Anyway, if you haven't taken a backup before the problems started: Try to recover what you can and make sure you have 3 master eligible nodes (with the correct settings for the quorum — depends on your version). Take snapshots in the future if the data should not be lost.

Is there anyway to recover the data for the below error i shown.
cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster

  1. If we can recover, please provide the query.
  2. If not, why we can't recover. please explain. As i'm new to this elastic search.