We have a 3-node cluster running ES 5.6.16 running in VMware VMs and we will soon migrate the nodes to another VMware-cluster. I'm planning to move it one node at a time, but how do I move it as efficiently as possible with respect to shard allocation and in what order would you recommend I do it? Performance is not critical during the move but I'd prefer to avoid downtime! Also, what impact do writes have during moving each individual node? Should they be avoided or can ES handle that?
assuming three new master/data node in same network segment etc.. with same ELK version.
if yes then simple thing is to add them to existing cluster.
once you have six node cluster. start removing older node one at a time.
then it is ok as well.
just take one node out of production.
Remove node
PUT /_cluster/settings
{
"transient" :{
"cluster.routing.allocation.exclude._name" : "node_name"
}
}
Remove node
PUT /_cluster/settings
{
"transient" :{
"cluster.routing.allocation.exclude.ip" : "ip_addr"
}
}
that will start moving all shard from this node to other two node
GET /_cat/shards
once all shard gone from this node, migrate it and add it back. do same for all other.
If your node is not changing name/ip and storage but just moving to different cluster then you don't have to do anything.
Sorry but if I exclude a data-node, won't that instigate a massive flood of data transfers as the master makes sure all primaries and replicas exist on the single remaining data-node? That can hardly be efficient considering the data-node in maintenance will be back shortly, right?
I was thinking about trying with "cluster.routing.allocation.enable" set to "primaries", what's your take on that?
it still should be fine as long as you don't have data being ingested.
if that is doable then just stop ingestion process and do it.
you only have two data node hence half of the primary indice will be gone.
As you have 3 master eligible nodes, you can perform a rolling migration.
Ensure you have at least one replica for all the indices.
Ensure the remaining 2 nodes can host all the primary shards of the cluster in case something goes wrong.
The procedure for a rolling restart is detailed here.
If you're moving the VM (meaning you're keeping the data and all the settings), it should be enough to perform the rolling restart.
If you're instead rebuilding the VM and the data is not kept on disk, you need to follow the suggestion of @elasticforme, using the allocation exclusion.
@Luca_Belluccini: please note that I have 3 master-eligible nodes but only two data-nodes. I'm thinking that when shutting down the first of the data-nodes, the other data-node will be instructed to promote the replicas to primaries (which is expected and fine) but will the "cluster.routing.allocation.enable" set to "primaries" avoid creating new replicas (which is what I'm aiming at)?
Also, would it not be better to make the changes permanent instead of transient in case I'm shutting down the active master in order to persist that setting?
Ok, if you have one replica for all the indices, you can still ensure you have all the data up at a given time (the cluster becomes yellow if one of the 2 data nodes go down, but the cluster will be still functional).
Yes, it also doesn't allow to create NEW primaries. So avoid doing this when you are potentially creating new indices, as the requests will be rejected.
See documentation.
You can, but as soon as you shut down the active master, an election will take place.
A Transient setting is a setting which is not persisted after a full cluster restart (which is not your case).
I think you can perform a rehearsal of this test just performing the rolling restart without actually moving the VM (or put in place a test environment).
Yes, it also doesn't allow to create NEW primaries. So avoid doing this when you are potentially creating new indices, as the requests will be rejected.
See documentation .
Oh ok! I kind of hoped that "new_primaries" would be included in the scope of "primaries" as I don't think that's made clear in the doc. Is there no way of specifying <any type of primaries, new or old>?
You can, but as soon as you shut down the active master, an election will take place.
A Transient setting is a setting which is not persisted after a full cluster restart (which is not your case).
Oh I see! So even in case of an election and another node becoming master, any transient settings are indeed kept? That's good to know!
I think you can perform a rehearsal of this test just performing the rolling restart without actually moving the VM (or put in place a test environment).
I actually did a lab yesterday using the same versions and the same role-setup and I did discover something potentially weird.
As I shut down the first datanode A (after I configured cluster.routing.allocation.enable to "primaries") the cluster performed as expected. It turned yellow and it promoted all replicas on the remaining datanode B to primaries. So far so good. But then I turned back on datanode A and it only got replicas. I don't have any special configuration for rebalancing so I guess that's not really strange but doesn't that put extra load on datanode A when I shut down B and it has to convert all those to primaries? We only have a few hundred indices but is a problem at all and if so, can it be mitigated?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.