Security index goes yellow when 1 of 3 masters go down

Hi Support,

We have 6 node elasticsearch cluster. Node 1 is master and Node 2 and 6 are master eligible.
Node 3, 4, 5, 6 are data nodes. Node 1,2,3,4,5 have same rack_id and Node6 has a different rack_id.
When we have more than 85% of disk utilization on data nodes, we identified our .security index going amber.
If our Node 6 gets isolated from the cluster, we could not login to kibana until the Node6 comes backs to the cluster.
I believe this is due to .security index going amber.
What should i do to make sure we have access to the cluster, when only Node6 goes down?

Kind regards

Welcome to our community! :smiley:

Why is this happening?

Also are you doing shard allocation on that rack_id ? I hope not as you only have one node in that 2nd rack and it'd get a lot of shards.

Is Kibana only pointed at Node 6? That would be odd, but would explain as if Node 6 is isolated it won't be in a cluster and won't allow data operations and thus Kibana won't like it.

If Kibana uses other nodes, what's in Kibana logs or messages on why it won't log in, as it makes no sense? Kibana shouldn't care if the indexes/shards are yellow, as has no effect on the 'clients' or anything at all other than data reliability.

Because it's located in a different location (rack) and the power on that rack went off. So all servers in that rack went down.

You're true on the shard allocation and we're ok with it.

Kibana is pointing to all data nodes.

I'll check on the logs.
I believe .security index being yellow indicates that it has unassigned replica shards on the cluster. So when Node6 is down, could .security index go red which means there's a data loss on security index. Is my understanding correct?

A single node within a cluster can never function properly on its own as no master can be elected. What you have in place is not a recommended or highly available configuration. The general recommendation is to distribute the cluster evenly across 3 racks or availability zones. Another option is to deploy evenly across 2 racks or zones and have a single dedicated master node in a third rack.

If the need goes red after only one node is unavailable it is possible that your cluster is misconfigured. What is the output of the cluster stats api?

1 Like

As Christian nodes, losing a node should not make things red - With node6 down, is the cluster red or yellow? If red, then you do indeed have issues and need to see which index is red, hopefully not .security - I assume of your indexes have replica 1 or 0-1 so should have 1 and everything should be green or yellow - this is the first thing to confirm.

Generally, node6 down should not matter at all (other than yellow cluster) and for testing I'd just turn it off - you have two master eligible nodes in rack 1 so I assume that cluster is okay, just status yellow since it's missing some replicas on node 6 - suggest you confirm that.

Would also get Kibana pointed at node 1 or something not node 6, making sure your LB or whatever distributes the Kibana connection is not stuck or sticky on node 6 (which is odd as it being down should unstick it if you have health checks). The Kibana log will help see if it's timing out or can't find index or what, and hopefully which node it's talking to.

Something still weird here.

Do you have any shard allocation awareness configured? Are all nodes running exactly the same version of Elasticsearch?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.