Temporary downtime during node migration (and order)

Hello!

We have a 3-node cluster running ES 5.6.16 running in VMware VMs and we will soon migrate the nodes to another VMware-cluster. I'm planning to move it one node at a time, but how do I move it as efficiently as possible with respect to shard allocation and in what order would you recommend I do it? Performance is not critical during the move but I'd prefer to avoid downtime! Also, what impact do writes have during moving each individual node? Should they be avoided or can ES handle that?

Specs below!

Thanks!

 {
  "cluster_name": "ELK",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 3,
  "number_of_data_nodes": 2,
  "active_primary_shards": 1053,
  "active_shards": 2106,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100
}

{
  "name": "elk01",
  "roles": [
    "master"
  ]
}
{
  "name": "elk02",
  "roles": [
    "master",
    "data",
    "ingest"
  ]
}
{
  "name": "elk03",
  "roles": [
    "master",
    "data",
    "ingest"
  ]
}

assuming three new master/data node in same network segment etc.. with same ELK version.
if yes then simple thing is to add them to existing cluster.
once you have six node cluster. start removing older node one at a time.

No new nodes will be used.

The in-use nodes will one at a time be taken offline, moved, then taken back online.

then it is ok as well.
just take one node out of production.

Remove node

PUT /_cluster/settings
{
"transient" :{
"cluster.routing.allocation.exclude._name" : "node_name"
}
}

Remove node

PUT /_cluster/settings
{
"transient" :{
"cluster.routing.allocation.exclude.ip" : "ip_addr"
}
}

that will start moving all shard from this node to other two node

GET /_cat/shards

once all shard gone from this node, migrate it and add it back. do same for all other.
If your node is not changing name/ip and storage but just moving to different cluster then you don't have to do anything.

Sorry but if I exclude a data-node, won't that instigate a massive flood of data transfers as the master makes sure all primaries and replicas exist on the single remaining data-node? That can hardly be efficient considering the data-node in maintenance will be back shortly, right?

I was thinking about trying with "cluster.routing.allocation.enable" set to "primaries", what's your take on that?

I have done that many time as well.
does this on upgrade mainly. because each datanode is going down for max less then 5-10 min.

if you think your node is not going down for long you should do that

Well, in our case it'll be more like 1-2 hours.

it still should be fine as long as you don't have data being ingested.
if that is doable then just stop ingestion process and do it.
you only have two data node hence half of the primary indice will be gone.

As you have 3 master eligible nodes, you can perform a rolling migration.

Ensure you have at least one replica for all the indices.
Ensure the remaining 2 nodes can host all the primary shards of the cluster in case something goes wrong.

The procedure for a rolling restart is detailed here.

If you're moving the VM (meaning you're keeping the data and all the settings), it should be enough to perform the rolling restart.

If you're instead rebuilding the VM and the data is not kept on disk, you need to follow the suggestion of @elasticforme, using the allocation exclusion.

Why? If the primary allocations are allowed and they have at least one replica, the replica on the remaining nodes will be promoted to primary.

ok. that I didn't know that if primary is out then replica can be promoted.

Your point is correct as primary and replica must be in sync.

@Luca_Belluccini: please note that I have 3 master-eligible nodes but only two data-nodes. I'm thinking that when shutting down the first of the data-nodes, the other data-node will be instructed to promote the replicas to primaries (which is expected and fine) but will the "cluster.routing.allocation.enable" set to "primaries" avoid creating new replicas (which is what I'm aiming at)?

Also, would it not be better to make the changes permanent instead of transient in case I'm shutting down the active master in order to persist that setting?

Ok, if you have one replica for all the indices, you can still ensure you have all the data up at a given time (the cluster becomes yellow if one of the 2 data nodes go down, but the cluster will be still functional).

Yes, it also doesn't allow to create NEW primaries. So avoid doing this when you are potentially creating new indices, as the requests will be rejected.
See documentation.

You can, but as soon as you shut down the active master, an election will take place.
A Transient setting is a setting which is not persisted after a full cluster restart (which is not your case).


I think you can perform a rehearsal of this test just performing the rolling restart without actually moving the VM (or put in place a test environment).

Yes, it also doesn't allow to create NEW primaries. So avoid doing this when you are potentially creating new indices, as the requests will be rejected.
See documentation .

Oh ok! I kind of hoped that "new_primaries" would be included in the scope of "primaries" as I don't think that's made clear in the doc. Is there no way of specifying <any type of primaries, new or old>?

You can, but as soon as you shut down the active master, an election will take place.
A Transient setting is a setting which is not persisted after a full cluster restart (which is not your case).

Oh I see! So even in case of an election and another node becoming master, any transient settings are indeed kept? That's good to know!

I think you can perform a rehearsal of this test just performing the rolling restart without actually moving the VM (or put in place a test environment).

I actually did a lab yesterday using the same versions and the same role-setup and I did discover something potentially weird.

As I shut down the first datanode A (after I configured cluster.routing.allocation.enable to "primaries") the cluster performed as expected. It turned yellow and it promoted all replicas on the remaining datanode B to primaries. So far so good. But then I turned back on datanode A and it only got replicas. I don't have any special configuration for rebalancing so I guess that's not really strange but doesn't that put extra load on datanode A when I shut down B and it has to convert all those to primaries? We only have a few hundred indices but is a problem at all and if so, can it be mitigated?

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.