How to determine if one of master gone from cluster?

Hi! I have a clean and simple installation os ES:
9 nodes: 6 data and 3 masters(without data).

I want to determine situation when one of masters gone for some reason(power off).
How I can do it on 7 ES? I looked at /_cluster/health and /_nodes endpoints and documentation, but failed to found answers.
Endpoints above show only available nodes and didn't show powered off nodes.

On previous version os ES(6) I could do it easily:

  1. Get minimum_master_nodes property from /_settings
  2. Get all nodes, count masters, and compare them with minimum_master_nodes.
    If count of masters wich I get from API <= minimum_master_nodes, so, some master nodes down.

From version 7 there are no options discovery.zen.minimum_master_nodes.
How I can do something similar?

P.S. It would be great to do this check without hardcode master nodes(or their count) somewhere. And sorry for my English :slight_smile:

Thank you!

You could use _cat/nodes, this shows what roles each node is set to and which is the master.
But you'd need to maintain the state of what that should look like yourself, as the APIs do not do that.

What's the driver behind wanting this info?

@warkolm, thank you for answer!

I'll try to explain our case.

We use the cloud in production (for simplicity we can assume that these are Kubernetes).
There is a special balancing mechanism in the cloud that can move instances (containers) between HW servers (even if it is stateful).

Our goal is to prevent a situation where the cloud decides to transfer two of the three masters at once simultaneously.

Each instance report to the cloud about itself(making some health check locally). The cloud looks at report and makes a decision: can an instance be stopped or moved. To do this, each instance must have information about the state of the cluster (about the availability of other masters).
For example, master1 must know, about availability of master2 and master3, to allow the cloud to stop or move it.
Without this information, we can get a situation where two from three master instances stopped by the cloud and we lose ES cluster.

Unfortunately, _/cluster/health and other endpoint do not give any information if one of master gone.

So, the goal: each node must know current cluster state.

Yes, we can use some external services like consul or hardcoded cluster nodes, but it would be great if ES cluster could report about his master nodes state(if I understand correctly, each master already store information about other masters, but that information unavailable from API)

Sorry for the long and possibly confusing explanation.

1 Like

If you are using K8s have you looked at pod disruption budget (PDB)?

@Vinayak_Sapre, hi! No, it's not Kubernetes, but, in Kubernetes terminology, we have three total independent Kubernetes clusters. One of three ES master in each cluster. So, we have the same problem - one Kubernete cluster doesn't know anything about other clusters and applications. But application(ES) know about itself.

can you check cluster stat? if it is green then you are good to move.
anything else you can't move (this will be only true for three node cluster)

if you have five node cluster then two node can go out and cluster stat is still green.

GET /_cat/health?pretty

@elasticforme, hi! Thank you for advice.
But, from documentation: green means that all shards are allocated.
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html
So, if we have 3 dedicated master(without data) when one of masters goes down, /_cat/health will show green status.

can you do some hard check. for example if you have three node

and GET /_cat/health returns number_of_nodes less then 3 don't move.

this might not be best option though.

So if I state in another way, you want to be alerted when one of your master is not in the cluster anymore. Since shards are only sitting on data nodes, health will still show as green.
Any other constraints you want to highlight here ?

@Ayush_Mathur, yes, you understood correctly. And no other constraints.

another option is
GET /_nodes/_all,master:true
and if failed is > 0 then don't move

"_nodes" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},

That is a good way, though I wonder when a master will drop from that list, i.e. no longer be failed.

Also can do like this and count the nodes with the 'm' in their role and compare to your goal of 3; if only 2; you've lost a master.
GET /_cat/nodes?h=name,node.role

You can just put up a watcher with http(s) input and make a call to _nodes/master:true endpoint. In condition, check if ctx.payload._nodes.total < expected_masters, return true and trigger action of your like - mail, slack, add doc in alerts index, etc. etc.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.