Hi all. Wanted to ask you, what is the best approach for using volumes within a cluster?
So basically I have a cluster with 3 master/data nodes (combined, not separate). And I want to increase this cluster and not lose any data.
Current my solution is to use 1 EFS storage (I'm using AWS to manage the cluster) and attach it to all nodes. And whenever I add a new node it will be linked there and all data will be present there too.
But the problem is that I got an error obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [286549ms] so I'm thinking if it is not better to use separate volumes for each node.
But keep in mind, that the number of nodes can increase and decrease, but data should be persisted no matter what.
Thanks, @Christian_Dahlqvist, I've checked it and understand that EFS is not the best solution, but I still can't get how to save all the data in case I use the Instance store. In case 1 instance will be terminated, where ES will take indices from?
In an Elasticsearch cluster shards are generally replicated, so even if you lose 1 node there is still a copy of the shard available. This can then be replicated so the cluster again holds 2 copies. You can use EBS volumes for storage and these are more resilient.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.