Shards unassigned for .kibana_task_manager index in cluster

guest123 · November 3, 2020, 11:57am

Hi,

running 3 nodes cluster in prod workload, having problem with .kibana_task_manager index in red state, ran cluster/allocation api came to know 2 of the shards unassigned here is the response.

for workaround this can be delete and its get recreated when kibana restarted but looking for permanent solution which does not reoccur in future.

am trying reproduce this error in lower but we are unable do this. Can someone help me in this regard.

{
"index":".kibana_task_manager",
"shard":0,
"primary":true,
"current_state":"unassigned",
"unassigned_info":{
"reason":"NODE_LEFT",
"at":"2020-11-01T21:00:43.758Z",
"details":"node_left [pyMkwKrJR0-XXhVhR1D8Sw]",
"last_allocation_status":"no_valid_shard_copy"
},
"can_allocate":"no_valid_shard_copy",
"allocate_explanation":"cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions":[
{
"node_id":"prod3",
"node_name":"prod3",
"transport_address":"...:9300",
"node_attributes":{
"ml.machine_memory":"16170143744",
"xpack.installed":"true",
"ml.max_open_jobs":"20",
"ml.enabled":"true"
},
"node_decision":"no",
"store":{
"found":false
}
},
{
"node_id":"prod1",
"node_name":"prod1",
"transport_address":"...:9300",
"node_attributes":{
"ml.machine_memory":"16346312704",
"ml.max_open_jobs":"20",
"xpack.installed":"true",
"ml.enabled":"true"
},
"node_decision":"no",
"store":{
"found":false
}
},
{
"node_id":"prod2",
"node_name":"prod2",
"transport_address":"...:9300",
"node_attributes":{
"ml.machine_memory":"16170151936",
"ml.max_open_jobs":"20",
"xpack.installed":"true",
"ml.enabled":"true"
},
"node_decision":"no",
"store":{
"found":false
}
}
]
}

tsullivan · November 3, 2020, 6:21pm

Hi, when an Elasticsearch cluster has unassigned primary shards, it will go into the "red" state to tell you that something is wrong.

Use the _cat/shards API to find which shards are unassigned and why. In your case, the "why" is: cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster. That means that the cluster is aware that the shared existed, but now it doesn't exist. It is not normal for shards to suddenly disappear, which is probably why you are not able to reproduce the issue easily. One way to reproduce it would be:

Start a cluster with 3 nodes
Create an index with no replica shards and 1 document
Find the node that has the primary shard for the index
Disconnect that node (cluster will go red).
Re-do step 2 using the same name of the index
If you bring the offline node back up, clear the data directory first. That ensures the shard is gone, but the reference is still in the ES cluster state.

The cluster is aware that there was a shard for the index that used to exist and now it doesn't. That is enough for the cluster state to go to "red." But when you try to re-allocate a shard for the same index, ES won't simply allocate it, as that would make it impossible for you to recover the data from a backup.

The .kibana_task_manager index settings are to use a single shard so the cluster will be green even if it has a single node. It also has the auto_expand_replicas setting set to 0-1 so that a replica shard will be assigned if a second node is available.

> GET /.kibana_task_manager/_settings
{
  ".kibana_task_manager_2" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "auto_expand_replicas" : "0-1",
        "provided_name" : ".kibana_task_manager_2",
        "creation_date" : "1603997355585",
        "number_of_replicas" : "0",
        "uuid" : "QDXEqYv9Tb2bevroJ9l8lg",
        "version" : {
          "created" : "7090199",
          "upgraded" : "7090399"
        }
      }
    }
  }
}

The way to reproduce this problem with auto_assign_replicas is to:

Start a cluster with 3 nodes
Allow the .kibana_task_manager's primary and replica shards to allocate
Find BOTH nodes that have the shards for the index
Disconnect both nodes.
Restart Kibana

guest123 · November 3, 2020, 6:56pm

Thanks for reply..!!
i did what you suggested for reproducing the .kibana_task_manager index to red, but not worked out. here is the steps i did.

started the cluster with 3 nodes like dev-1,dev-2,dev-3.
.kibana_task_manager is allocated on 2 nodes initially dev-1(priamary),dev-
2(replica).
disconnected(kill the es processes) the dev-1 and dev-2 from the cluster, this moment i cannot fire any commands as i will get master undiscovery exception as it need 2 nodes should up and running in 3 node cluster.
restated kibana server
started the dev-2 es sever

O/P: .kibana_task_manager does automatically allocated other nodes once any 2 nodes up instead of red.

Kindly suggest me if am wrong. Thanks

rudolf · November 4, 2020, 12:26pm

As Tim mentioned the root cause appears to be that your cluster has nodes which go offline. If all shard replicas disappear Elasticsearch will be unable to recover.

I'd suggest you investigate why the shard and it's replica are both going offline and try to prevent that from happening.

system · December 2, 2020, 12:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
.kibana primary shard red/unassigned Kibana	3	865	June 14, 2017
[Shard Unassigned] Solutions to reassigned the shard Elasticsearch	4	543	July 6, 2017
Assign unassigned primary shard Elasticsearch	7	2473	July 6, 2017
Unassigned shards, crashed cluster recovery Elasticsearch	9	13179	February 2, 2018
Unassigned shards Elasticsearch	3	523	July 6, 2017

Shards unassigned for .kibana_task_manager index in cluster

Related topics