How to change the data directory of a node in a cluster?

I have a cluster of 8 nodes. This is an "under development but with real data" kind of project (the worst type of project) and I see that I will soon have problems with disk space of the default data store.

path:
  data: /var/data/elasticsearch

I would like to mount on each of my nodes a new disk (one extra for each node) without disrupting ES's operations and ideally using its replications capabilities.

Plan A

On each node, one by one:

  • change data in the configuration to /elasticsearch-data
  • restart ES

The idea is that the node gets an empty data store and requests the data from the other nodes

Pros:

  • easy and relatively quick

Cons:

  • I have no idea if ES works the way I imagine for the case of an empty data store :grimacing:

Plan B

On each node, one by one

  • remove the node from the cluster
  • change data in the configuration to /elasticsearch-data
  • bring back the node to the cluster

Pros:

  • sounds less hacky than Plan A

Cons:

  • I still do not know if this will work as I imagine it, but it sounds rather clean
  • takes more time

I obviously have a preference for Plan A, but Plan B is fine too - ultimately I would like to end up with a working cluster so any feedback/hint/advice/warning is welcome

Your plan A will basically create new nodes and if you do not have exclusive master nodes I think that this can be an issue.

If you change the data in the configuration and start the node, it will see that the data directory is empty and will join the cluster as a completely new node with a new node id, I'm not sure if this can create any issue since I did not run into this scenario, but this will be a new node.

If you have any primary shard in this node that didn't have any replica, your cluster state will be RED and the data will be lost.

The correct approach is the Plan B, you need to use the cluster allocation API to exclude the node from the cluster, then the shards in this node will be allocated to the other nodes, after this is finished you need to turn off the node, copy the content of the current data directory to the new data directory and restarted the node.

After the node is back online and joined the cluster (it will be the same node as the metadata was copied), you need to change the cluster allocation to allow this cluster to get some shards and wait for the cluster to finish the rebalancing, after the rebalancing is done you may proceed to do the same on other node.

This approach is safe, but depending on the amount of data in each node, it can take some time to empty a node and rebalance it again.

Another approach that is a little more risky is to turn off elasticsearch on the node, rsync the current data directory into the new one, change the path in the configuration and start the node again.

Elasticsearch doesn't read the config file after startup, so I think your two plans are logically identical. I would expect them to work, although there will be reduced availability during the process.

Alternatively you can shut a node down, move the data path over, adjust the config, and start the node back up again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.