Best practice of using volumes across a cluster

Hi all. Wanted to ask you, what is the best approach for using volumes within a cluster?

So basically I have a cluster with 3 master/data nodes (combined, not separate). And I want to increase this cluster and not lose any data.

Current my solution is to use 1 EFS storage (I'm using AWS to manage the cluster) and attach it to all nodes. And whenever I add a new node it will be linked there and all data will be present there too.

But the problem is that I got an error obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [286549ms] so I'm thinking if it is not better to use separate volumes for each node.

But keep in mind, that the number of nodes can increase and decrease, but data should be persisted no matter what.

Thanks in advance.

Have a look at the recommendations around deploying Elasticsearch on AWS. As you can see it is not recommended to use EFS for Elasticsearch storage.

1 Like

Thanks, @Christian_Dahlqvist, I've checked it and understand that EFS is not the best solution, but I still can't get how to save all the data in case I use the Instance store. In case 1 instance will be terminated, where ES will take indices from?

In an Elasticsearch cluster shards are generally replicated, so even if you lose 1 node there is still a copy of the shard available. This can then be replicated so the cluster again holds 2 copies. You can use EBS volumes for storage and these are more resilient.

Thanks a lot for your help @Christian_Dahlqvist

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.