Is there a way to place one or more elasticsearch shards on a separate disk?

I have a large elasticsearch index which is stored on an external blob storage device attached to my server, but I also have a large amount of free space on the main hard disk. Id like to put that space to use, if possible.

In order to more efficiently use resources, I thought that it might work to move one of the shards to the main hard disk. Is there a way to instruct ES, for instance, to put one shard on the main hard disk and two additional shards on the blob storage?

So far, I tried to do this manually by moving one shard’s directory to the main SSD and point the shard directory to the new location with a symlink, but ran into permissions issues. I stopped ES, and then copies the data with all permissions using cp -R --preserve=all <index_directory>/0 <new_location_of_shard_directory_on_main_disk>, and then pointed the 0 shard directory to the new location using a symlink: ln -s 0 <new_location_of_shard_directory_on_main_disk>/0.

This mostly worked, except that when I restarted elasticsearch, there were permissions issues: the log file issued an exception like this, with the key being that ES can't access the folder. I ensured that it was owned by elasticsearch:elasticsearch and that it had equal permissions as before.

Caused by: java.security.AccessControlException: access denied ("java.io.FilePermission" "/elastic-data/r6FYn6kPR1CuMwz49YQg_Q/0/_state/state-0.st" "read")
	at java.security.AccessControlContext.checkPermission(AccessControlContext.java:485) ~[?:?]
	at java.security.AccessController.checkPermission(AccessController.java:1068) ~[?:?]
	at java.lang.SecurityManager.checkPermission(SecurityManager.java:411) ~[?:?]
	at java.lang.SecurityManager.checkRead(SecurityManager.java:751) ~[?:?]
	at sun.nio.fs.UnixChannelFactory.open(UnixChannelFactory.java:246) ~[?:?]
	at sun.nio.fs.UnixChannelFactory.newFileChannel(UnixChannelFactory.java:133) ~[?:?]
	at sun.nio.fs.UnixChannelFactory.newFileChannel(UnixChannelFactory.java:146) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:179) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:298) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:357) ~[?:?]
	at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78) ~[lucene-core-9.1.0.jar:9.1.0 5b522487ba8e0f1002b50a136817ca037aec9686 - jtibs - 2022-03-16 10:32:40]
	at org.elasticsearch.gateway.MetadataStateFormat.read(MetadataStateFormat.java:303) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.MetadataStateFormat.loadGeneration(MetadataStateFormat.java:428) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.MetadataStateFormat.loadLatestStateWithGeneration(MetadataStateFormat.java:460) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.MetadataStateFormat.loadLatestState(MetadataStateFormat.java:485) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:120) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:52) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:334) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:67) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:918) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.2.2.jar:8.2.2]
	... 3 more

No, it is not possible, all the shards for a node needs to reside in the same path.data that is configured in elasticsearch.yml.

Also, you should not make any changes in the underlying files in the data directory, you risk corrupting your data and losing it if you do not have any snapshot.

@leandrojmp I understand that it’s not recommended, but so I understand correctly, if all the data is there and the directory for the shard is just a symlink that points to another location, then it theoretically shouldn’t make a difference to Elasticsearch, right? ES would see data-location/index-name/indexes/0 and it would just contain all the same data, but in another location.

In theory it should've work, Elasticsearch supports symlinks for the path.data, but I'm not sure how this would behave if you changed just one of the shards.

I don't think this is even tested because Elastic already strongly advise against manually chaging the data files.

If you already configured all the persmissions and it still doesn't work, you will need dig into the code to find the issue or wait to see if someone from Elastic can give some light on this because it is a very particular use case.

Also, in which version you are? There was some bug related to path.data and symlinks in versino 8 according to this github issue.

@leandrojmp I’m currently on 8.2.

Replacing a directory in the data path with a symlink is definitely not supported - see e.g. these docs:

WARNING: Don’t modify anything within the data directory or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the data directory, then Elasticsearch may fail, reporting corruption or other data inconsistencies, or may appear to work correctly having silently lost some of your data.

If you want to allocate different shards to different locations, run multiple nodes each with its own path.data and use allocation filtering to control which shards can be allocated to each node.

3 Likes