.kibana Index yellow (Shard is UNASSIGNED) - why isn't it recoverying

Hi,

I use ES 5.0.0 with several indices including .kibana. Because this is the smallest one, I picked it for the demonstration here, but a lot of other indices are also yellow. They are not yellow from the beginning, but after a while in production some of them change.

I have three master nodes and two data nodes in my environment. The nodes are reachable and the cluster has no special settings. There are also plenty of system resources left (memory, disk, cpu). Sometimes, it happens that one node maybe is disconnected for a few seconds due to network issues.

Index-output

health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .kibana cZwgRQO9SJa10m0p6aGa8Q   1   1        205            4    406.4kb        406.4kb

and the shards

index   shard prirep state      docs   store ip              node
.kibana 0     p      STARTED     205 406.4kb 138.201.138.161 ZPJ6URQ
.kibana 0     r      UNASSIGNED    

The settings for .kibana are

{
  ".kibana": {
    "settings": {
      "index": {
        "creation_date": "1478249772863",
        "number_of_shards": "1",
        "number_of_replicas": "1",
        "uuid": "cZwgRQO9SJa10m0p6aGa8Q",
        "version": {
          "created": "5000099"
        },
        "provided_name": ".kibana"
      }
    }
  }
}

As far is I can tell, there should be one primary shard and one replication of it. Why isn't this working for this .kibana shard? How can I investigate why this is happening.

Have you set minimum_master_nodes to 2 in order to avoid split brain scenarios? What does the output from the _cat/nodes API look like?

Hi, yes. I have three master nodes and set the minimum to two.
I will add the output first thing in the morning.

This is now awkward, but I restarted the ES-Cluster and the errors are gone (for now). They should happen again in a few days. Then I update this issue with the output of _cat/nodes

As per split brain scenarios the node will not join the cluster upto restart and loose the data what ever in the restarted server. Please check once again for the spilt brain scenarios and for any loss of data in the cluster.

How can this syndrome be checked?

What does the output of_cat/nodes look like?

138.201.138.161  9 99 4 0.22 0.17 0.24 m  - yr41mRF
138.201.138.161 53 94 6 0.43 0.51 0.69 di - W4WL1Cd
138.201.138.161 23 94 4 0.34 0.41 0.60 m  - NZu_qGq
138.201.138.161 52 94 2 0.34 0.41 0.60 di - ZPJ6URQ
138.201.138.161 11 94 5 0.43 0.51 0.69 m  * BUvw64s

Are all the nodes running on a single host? What is the specification of this host? What are the reasons for this setup?

Elasticsearch will avoid placing replicas on the same physical host as the primary, which could explain why your replica is not getting placed.

Hm, no, acutally they all run on different hosts (at least the masters).
So there seems to be a problem in my setup.

Thanks

Yap, this did the trick. Not it works.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.