Replace ES master node

Hi, we use a 3-node master-eligible ES6 cluster, and we need to replace one of the nodes. I'm trying to follow the official doc here: https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-discovery-adding-removing-nodes.html and I have a few questions:

  1. Is it better to bring the existing node down and then add the new one, or the other way around?
  2. Is it required to let the application use the new node as soon as possible or it can get away with using its current list of nodes which does not yet have the new one?
  3. Is it possible to run versions 6.8.3 & 6.8.6 in the same cluster?

Thanks!

Yes, if you have an odd number of nodes, because otherwise you have to increase discovery.zen.minimum_master_nodes to 3 before starting the 4th master-eligible node, and then reduce it back to 2 again afterwards.

In 7.x this setting is ignored and the order doesn't matter.

I don't think it matters, assuming you only have 3 nodes. If you have more nodes then your application should not be using the master-eligible ones at all.

Yes, but only while you are upgrading everything to 6.8.6. You shouldn't run a mixed cluster for an extended period of time.

1 Like

Yes, but only while you are upgrading everything to 6.8.6. You shouldn't run a mixed cluster for an extended period of time.

I think I read somewhere that minor version changes are compatible enough to be able to run fine alongside each other? Such as 6.8.3 & 6.8.6. The problem is that the other machines run older FreeBSD versions where updates aren't supported, so it will take some time to update them all, maybe a few weeks or even months.

Even minor version may have different Lucerne versions and an index created on a newer node might not be possible to replicate to older nodes which can cause problems. This is the main reason as far as I know all nodes should have the same version once migration has completed.

2 Likes

Ok, so if we have 3 master-eligible nodes A, B, C, and decide to add another node D (which is going to replace C), and ensure that

is set to 3 before adding the server, will the client application still work with the old list A,B,C for the duration of the switch?

Then, if we replace the list of servers the client application uses to A,B,D and reload it, will the app still work? And then we would be free to just shut C down?

Yes, that sounds like it'll work.

Thanks. There's another weird problem. With 3 ME nodes A,B,C, and minimum_master_nodes=2 if any of the nodes goes down, the app fails as it would with a single node, with these errors:

[2020-03-20T23:55:07,781][DEBUG][o.e.a.s.TransportSearchAction] [sun.example.com] All shards failed for phase: [query]
org.elasticsearch.index.query.QueryShardException: failed to create query: {

Would you know what the reason could be?

It's hard to say from this tiny fragment of a single DEBUG log message. Maybe this was a search that was ongoing on the node that went down, and you need to retry on a different node.