Is there a way to place one or more elasticsearch shards on a separate disk?

jalustig · July 16, 2022, 4:07pm

I have a large elasticsearch index which is stored on an external blob storage device attached to my server, but I also have a large amount of free space on the main hard disk. Id like to put that space to use, if possible.

In order to more efficiently use resources, I thought that it might work to move one of the shards to the main hard disk. Is there a way to instruct ES, for instance, to put one shard on the main hard disk and two additional shards on the blob storage?

So far, I tried to do this manually by moving one shard’s directory to the main SSD and point the shard directory to the new location with a symlink, but ran into permissions issues. I stopped ES, and then copies the data with all permissions using cp -R --preserve=all <index_directory>/0 <new_location_of_shard_directory_on_main_disk>, and then pointed the 0 shard directory to the new location using a symlink: ln -s 0 <new_location_of_shard_directory_on_main_disk>/0.

This mostly worked, except that when I restarted elasticsearch, there were permissions issues: the log file issued an exception like this, with the key being that ES can't access the folder. I ensured that it was owned by elasticsearch:elasticsearch and that it had equal permissions as before.

Caused by: java.security.AccessControlException: access denied ("java.io.FilePermission" "/elastic-data/r6FYn6kPR1CuMwz49YQg_Q/0/_state/state-0.st" "read")
	at java.security.AccessControlContext.checkPermission(AccessControlContext.java:485) ~[?:?]
	at java.security.AccessController.checkPermission(AccessController.java:1068) ~[?:?]
	at java.lang.SecurityManager.checkPermission(SecurityManager.java:411) ~[?:?]
	at java.lang.SecurityManager.checkRead(SecurityManager.java:751) ~[?:?]
	at sun.nio.fs.UnixChannelFactory.open(UnixChannelFactory.java:246) ~[?:?]
	at sun.nio.fs.UnixChannelFactory.newFileChannel(UnixChannelFactory.java:133) ~[?:?]
	at sun.nio.fs.UnixChannelFactory.newFileChannel(UnixChannelFactory.java:146) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:179) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:298) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:357) ~[?:?]
	at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78) ~[lucene-core-9.1.0.jar:9.1.0 5b522487ba8e0f1002b50a136817ca037aec9686 - jtibs - 2022-03-16 10:32:40]
	at org.elasticsearch.gateway.MetadataStateFormat.read(MetadataStateFormat.java:303) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.MetadataStateFormat.loadGeneration(MetadataStateFormat.java:428) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.MetadataStateFormat.loadLatestStateWithGeneration(MetadataStateFormat.java:460) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.MetadataStateFormat.loadLatestState(MetadataStateFormat.java:485) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:120) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:52) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:334) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:67) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:918) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-8.2.2.jar:8.2.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.2.2.jar:8.2.2]
	... 3 more

leandrojmp · July 16, 2022, 5:26pm

No, it is not possible, all the shards for a node needs to reside in the same path.data that is configured in elasticsearch.yml.

Also, you should not make any changes in the underlying files in the data directory, you risk corrupting your data and losing it if you do not have any snapshot.

jalustig · July 16, 2022, 5:51pm

@leandrojmp I understand that it’s not recommended, but so I understand correctly, if all the data is there and the directory for the shard is just a symlink that points to another location, then it theoretically shouldn’t make a difference to Elasticsearch, right? ES would see data-location/index-name/indexes/0 and it would just contain all the same data, but in another location.

leandrojmp · July 16, 2022, 6:18pm

In theory it should've work, Elasticsearch supports symlinks for the path.data, but I'm not sure how this would behave if you changed just one of the shards.

I don't think this is even tested because Elastic already strongly advise against manually chaging the data files.

If you already configured all the persmissions and it still doesn't work, you will need dig into the code to find the issue or wait to see if someone from Elastic can give some light on this because it is a very particular use case.

Also, in which version you are? There was some bug related to path.data and symlinks in versino 8 according to this github issue.

jalustig · July 16, 2022, 7:47pm

@leandrojmp I’m currently on 8.2.

DavidTurner · July 17, 2022, 8:21am

Replacing a directory in the data path with a symlink is definitely not supported - see e.g. these docs:

WARNING: Don’t modify anything within the data directory or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the data directory, then Elasticsearch may fail, reporting corruption or other data inconsistencies, or may appear to work correctly having silently lost some of your data.

If you want to allocate different shards to different locations, run multiple nodes each with its own path.data and use allocation filtering to control which shards can be allocated to each node.

system · August 14, 2022, 8:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Put each shard on different disk Elasticsearch	8	1646	July 6, 2017
Shard distribution using multiple path.data locations Elasticsearch	2	1148	May 26, 2017
Data Relocation Elasticsearch	2	320	September 14, 2019
Store Indexes on multiple drives Elasticsearch	11	2394	May 24, 2019
How to store shards in elasticsearch, when there are to path set(path.data)? Elasticsearch	1	331	July 6, 2017

Is there a way to place one or more elasticsearch shards on a separate disk?

Related topics