Three questions about master/data node failover

Hi,

can you answer the following three questions about master/data node failover?

  1. If the master goes down and one of the master eligible data nodes becomes the new master, then will this new master continue to function as both data and master or will it stop working as data node and will continue to work only as master?

  2. If the master goes down and a master eligible data node becomes a master, doesn't the new master need any data from the fallen master in order to function properly? Where does it get all the metadata it needs about shards, indexes etc?

  3. Let's consider that a master goes down and I have specified that all other nodes are either only data or client nodes, but not master eligible nodes. Then the cluster will wait until I bring up a new master node, however the question here is whether the new master node will be able to function straight away without any metadata about the cluster it is joining or will it first need the metadata from previous master?

Thanks is advance!

  1. If the new master has been a data node, it will continue to be data node.
  2. No, metadata are always synced among eligible masters. It's ok for a master node to go down as long as you have eligible master nodes around and meet the discovery.zen.minimum_master_nodes requirement
  3. Not sure, but you should not be running a production cluster with one eligible master node in the first place. I believe that the new master node will work right away without the metadata from previous master.

I plan to use the Amazon ECS service, which will automatically start a new master if the current master fails. In this case is important for me to know for sure whether the the master will be able to work with the cluster without having any metadata from the previous master. Also I won't need any master eligible nodes as new master will be put up automatically by the ECS service.

Also lets assume that a master fails and a master eligible datanode becomes both a master and a data node. If I bring up a new master that is just master node, is then possible to make the master/data node function again only as data node? I don't want to have nodes functioning both as master and data nodes.

How much data do you need to store? I would recommend that you have 3 eligible master nodes running for a cluster. If you want to save cost, just run both master and data roles on 3 nodes, the rest can be dedicated data nodes. Don't rely on ECS to turn on a new master node because your cluster will be down during that time.

Data will be around 200 GB a month. The cluster being down is fine with me, as I plan to use a broker infront of logstash to keep the messages until the master comes back up. However I don't want ending up with nodes running as both master and data nodes.

At 200GB per month, it's ok to run both master and data roles on the same node. Actually, you don't have to spin up more servers for master nodes. You can run more than one ES instance on a single server/VM, and that's what I prefer since each instance runs on its own JVM.

Thanks for the suggestion, I think is a valid one.

However I plan to run ES in AWS as a EC2 container service. I plan to create separate ECS clusters for ES master and ES data nodes. In the first I will run only one ES master and in the second 2 or 3 data nodes. The master node will be the only master eligible node in the cluster.

When a docker container fails it will be spawn again by ECS on top of the data left by the previous container. Failure of the master EC2 instance will be handled by restore from a EBS snapshot of the master's volume, that I will take daily.

In the worst case the cluster will be read-only when the master fails, but once the master is back the cluster will be up and running as before. As I will use broker infront of logstash no messages will be lost.

The advantages I see in this setup are that I won't end up in a situation in which the master has failed and one of the data nodes has taken the master role. There is no way to go back the original setup of roles if this happens.
Also I won't have to pay for spare master-only instance, which is needed in order to prevent the situation with a data node becoming both a data and a master.
Third I will be able to use different types of EC2 instances for master/data (like more disk for data, etc).

All of this goes agains the design of ES in which the fallen master is taken over by a data node, but when using ES with AWS ECS I think that my planned setup makes more sense.

Feel free to comment or correct me.

This may not be true.

I'm still not sure why you don't want to run both data and master roles on the same node with your amount of data. I think it only matters when you have tens of TB of data in the cluster.

The data amount will start with 200 GB, but will most likely increase with time. It may not reach tens of TB, but still I would like to have an ES cluster that can easily scale and can accommodate much more data that I plan to have. Also it should keep its original architecture after failures.

In other words I want to build it the right way from the start, so that I won't have to make changes in the future.

1 Like