When we spun up our newest search cluster we way over provisioned (in Azure VMs) disk capacity. Each node is hosting about ~78 GB of data but has 15 x 1 TB drives (for about 0.38% disk utilization). The drives are Azure Premium SSDs and I want to remove all but one of them.
My questions:
Is there a way to determin for a given node which drives are hosting shards? There are only a few shards per node, and I don't think a shard can span drives, so there must be drives that are not hosting data.
If I want to remove drives that are hosting shards, would it be best to:
a) Just remove the drive (one at a time) and let ES rebalance.
b) Disable reallocation, stop the node, try to manually move the data, update the drives, then reverse the steps.
c) The same as above, but without trying to move the data
Depends how you have it setup, but Elasticsearch will use any and all disks available to it.
Your best option would be to provision a new node, use filtering to exclude one of the existing nodes, wait for the node to have no data on it (ie reallocation is complete) and then remove the node.
"...Elasticsearch will use any and all disks available to it."
Is true, but can be misleading. While all 15 drives are configured and available for ES, shards are never spanned across disks, so ES will use at max the number of drives equal to the number of shards on the node. For example, one of my nodes is hosting 4 shards. This means that it can at most be hosting data on four of the drives (with the possibility that it is less if multiple shards are hosted on a given drive). That means that there are at least 10 drives that are not being used in any way. I can just remove the number of drives I want to remove and let ES fix things up, but it would be great if I can identify drives that are not hosting shards and remove them first.
So my question, is there any api call I can make that would identify which drive a given shard is hosted on?
I think this can be deduced from the indices stats API if you add the ?level=shards query parameter. However this only tells you the current status and there is definitely a risk that a shard might move, rendering the stats output stale, before you can act on it.
All of the options that you suggest sound somewhat risky.
Elasticsearch won't know what to do if you remove a drive while it's running and will, I think, be deeply unhappy about the situation. If the node doesn't fall over itself you'll probably need to restart it.
Manually rearranging the contents of the data path is wholly unsupported.
Removing data paths from an Elasticsearch node while it's shut down is questionable. It might be ok with it, although I don't see anything in the test suite that checks this. Make sure you have replicas and/or snapshots before trying it.
The safest path forward is what @warkolm suggests: start new nodes, each pointing at a single drive, and allow Elasticsearch to move the data itself, then decommission the old nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.