Cluster.initial_master_nodes for ElasticSearch on EC2 with Auto Scaling

Trying to upgrade ES from 6.8 to 8.9 for ES running on EC2 AWS instances.
The issue is within cluster.initial_master_nodes. I see that I set fixed values for node.name. However, the instances are created by Auto Scaling group (Terraform configs), the ES template is one per group and I don't see the possibility of passing the index count of the instance created.

We need a scaling group to ensure there’s always the same number of instances at any given time. And looks like there’s no other way to achieve that.

Looks like there's no way to use ES 7+ with AWS Scaling group. Previously, 'cluster.initial_master_nodes' wasn't needed and node.name was just set to ${HOSTNAME}.

Any ideas?..

You only need cluster.initial_master_nodes the first time the cluster starts. You can get this to work by starting a "seed" node outside the ASG, with fixed node name and cluster.initial_master_nodes, then starting up the rest of the cluster and having it join the seed node, and then finally shutting down the seed node.

However note that auto-scaling groups are designed more for stateless services, they don't work particularly well with stateful services like ES.

sorry, this looks like a big hack.

still looking for solutions...

Can you provide a little more context of what you are trying to do?

What you want to auto scale? Master nodes? Data nodes? Ingest nodes? And what would trigger the scale-out? Would you scale-in as well?

I don't think that auto scale would work well for Elasticsearch, when you add a new node the cluster will start rebalancing the shards, which can take a some time depending on the size and number of shards, and to safely scale-in you would need to empty the node before removing it, and this also can take some time.

Can you give an example of how you would use auto-scaling with ES and what this would solve?

Another question, are you upgrading or creating a new cluster? If you are upgrading you would not need to use cluster.initial_master_nodes as a cluster was already formed, also, not sure what path you are using to upgrade, but you can't go from 6.8 directly to 8.9, you need to go to 7.17 first.

Also, it is not clear how this is related to auto scaling and what you mean by index count. Are you talking about number of shards and replicas?

Autoscaling group is used only for the restart - I want the failed nodes to be created again. Nothing is really scaled there.
I have the same strategy for master and data nodes.

My upgrade is strategy is to destroy the cluster and start a new one. This ES is used for logs.

This works fine for stateless services, but unfortunately you can't rely on it for stateful systems like ES. If two nodes fail at around the same time, and those are the two of the three with the latest cluster state, then creating two new nodes won't bring the cluster back. This might have seemed to work in 6.8 and earlier, but in fact that was a bug that could lead to data loss and was fixed in 7.0.

Unfortunately there is no way to do this that doesn't "look like a big hack" :wink: ASGs don't do what you need them to here.

2 Likes

Thank you for the information. " This might have seemed to work in 6.8 and earlier, but in fact that was a bug that could lead to data loss and was fixed in 7.0." :sweat_smile:
Alright, we decided to leave autoscaling with the 6.8, moving to the new chapter without it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.