Removing node from cluster and replacing it with a new one


My setup contains 3 master nodes and 9 data nodes.
Each index have 9 shards and 1 replica.

ES Cluster is setup in AWS. I wanted to upgrade from Ubuntu 14.04 to 16.04.
It means that I have to delete instance and create a new one.
Cluster have 24TB of data, each node have 5TB external EBS mounted.

I'm using Terraform as Platform as a Code + Ansible for provisioning so I'm trying to not use AWS Console.

I started with
'{"transient": {"cluster.routing.allocation.enable": "none"}}'

Then removed instance and start a new one. Once plugged in into cluster I set:
'{"transient": {"cluster.routing.allocation.enable": null}}'

Then the problem appeared. New instance started to rebuilding indexes but:

  • It took almost 21h (~4TB)
  • Latest indexes was setup into read-only state, so no recent logs was available.
  • After 21h I had to change all indexes from read-only to rw
  • Log-stash and log-courier instances had to be restarted several times to make it running.

I found a parameter for speeding-up transfer:
{ "indices.recovery.max_bytes_per_sec" : "800mb"}

But the problem was that for a long time we were blind.

What is the official way of doing above procedure and not loosing monitoring information for time when indexes are rebuild on new node.

Thanks for any help.

Not sure about an official way, but the way I've done this in the past is with shard allocation filtering. Exclude a node, wait for the node to be empty and replicas to be established, deke the node, bring on a new node, wait for replicas to be established, and so on. You could also just do this in one big step by bringing up 9 new data nodes, excluding the original 9, waiting for green, and shutting them down. No matter which way you go, expect to wait a while moving 24TB around from one set of EBS volumes to another even if you went with provisioned IOPS.

I do not understand why indices were created in a read-only state during recovery. This isn't what I would expect. Can you share any logs that show what was going on here?

I think you can treat this as a rolling upgrade: shut down each node, move the EBS volume to the new instance, and then start a node on that new instance pointing at the EBS volume, and the node should come back as if it were simply restarting. Any shards that were sync-flushed should recover straight away, and any others will go through the normal recovery process.

1 Like

Logstash logs:
[2018-10-18T09:49:18,297][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})

This is all I have. No other information in logs on nodes or masters.

@loren This looks like best solution
@DavidTurner In this scenario I would have to do a lot manual work on AWS console which I would like to avoid.

Also I have question regarding number of nodes.
Let's assume one of the EBS would fail so the same situation would happen. This means problem with accessing logs for some period of time.
Currently 9 nodes are c4.4xlarge instances. Would it be a good idea to have more weaker nodes like 24 instances c5.xlarge? This would speed up cluster recovery since less data would be stored on each node.

What is your opinion?

I would not use EBS in the first place for a busy production cluster. On EC2, I like the i3's with local NVMe drives.

I would normally expect this to happen if you exceed the flood-stage disk watermark (i.e. you nearly ran out of disk space) but this should yield messages in the Elasticsearch logs if so. Are you sure there are no such messages?

You can do all of this process via Elasticsearch APIs or the aws command-line tool.

Smaller nodes should indeed recover more quickly.

I think the problem was that I started this scenario like I would do for Rolling Upgrade, but deleted all data from one node.
Once I set '{"transient": {"cluster.routing.allocation.enable": "none"}}' I disabled cluster auto-health. That's why it failed. If I just remove one node, add a new one and wait for shard and replicas rebalancing everything should be fine. This will be classic ES failover scenario.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.