How to change the data directory of a node in a cluster?

Wpq · March 15, 2023, 5:33pm

I have a cluster of 8 nodes. This is an "under development but with real data" kind of project (the worst type of project) and I see that I will soon have problems with disk space of the default data store.

path:
  data: /var/data/elasticsearch

I would like to mount on each of my nodes a new disk (one extra for each node) without disrupting ES's operations and ideally using its replications capabilities.

Plan A

On each node, one by one:

change data in the configuration to /elasticsearch-data
restart ES

The idea is that the node gets an empty data store and requests the data from the other nodes

Pros:

easy and relatively quick

Cons:

I have no idea if ES works the way I imagine for the case of an empty data store

Plan B

On each node, one by one

remove the node from the cluster
change data in the configuration to /elasticsearch-data
bring back the node to the cluster

Pros:

sounds less hacky than Plan A

Cons:

I still do not know if this will work as I imagine it, but it sounds rather clean
takes more time

I obviously have a preference for Plan A, but Plan B is fine too - ultimately I would like to end up with a working cluster so any feedback/hint/advice/warning is welcome

leandrojmp · March 15, 2023, 7:36pm

Your plan A will basically create new nodes and if you do not have exclusive master nodes I think that this can be an issue.

If you change the data in the configuration and start the node, it will see that the data directory is empty and will join the cluster as a completely new node with a new node id, I'm not sure if this can create any issue since I did not run into this scenario, but this will be a new node.

If you have any primary shard in this node that didn't have any replica, your cluster state will be RED and the data will be lost.

The correct approach is the Plan B, you need to use the cluster allocation API to exclude the node from the cluster, then the shards in this node will be allocated to the other nodes, after this is finished you need to turn off the node, copy the content of the current data directory to the new data directory and restarted the node.

After the node is back online and joined the cluster (it will be the same node as the metadata was copied), you need to change the cluster allocation to allow this cluster to get some shards and wait for the cluster to finish the rebalancing, after the rebalancing is done you may proceed to do the same on other node.

This approach is safe, but depending on the amount of data in each node, it can take some time to empty a node and rebalance it again.

Another approach that is a little more risky is to turn off elasticsearch on the node, rsync the current data directory into the new one, change the path in the configuration and start the node again.

DavidTurner · March 15, 2023, 10:27pm

Elasticsearch doesn't read the config file after startup, so I think your two plans are logically identical. I would expect them to work, although there will be reduced availability during the process.

Alternatively you can shut a node down, move the data path over, adjust the config, and start the node back up again.

system · April 12, 2023, 10:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Changing nodes route Elasticsearch	1	479	April 27, 2017
How can I move data from one disk to another? Elasticsearch	6	3230	July 5, 2017
Moving elasticsearch storage directory Elasticsearch	3	4304	February 6, 2021
Path.data changing the data path from one place to other Elasticsearch	2	493	October 24, 2018
ES 2.0 rename cluster Elasticsearch	6	2723	July 5, 2017

How to change the data directory of a node in a cluster?

Plan A

Plan B

Related topics