Avoiding duplicate data and work when using a shared filesystem

pweaver · May 7, 2015, 5:40am

If two elasticsearch nodes share the same filesystem, then is there any way to prevent them from pointlessly copying data from one node to the other (e.g. don't actually copy anything when rebalancing), or duplicating the data unnecessarily (e.g. when using replicas)? Ideally all nodes would just have read access to all shards, and the number_of_replicas setting would be irrelevant. I'm interested in both practical and hypothetical solutions.

Although my interest is in running elasticsearch on NFS, this question also applies to when you run more than one instance of elasticsearch on the same machine.

I am aware of the performance considerations when running on a network filesystem. However, for my particular use case, I believe we are CPU bound because we do a lot of aggregations and fully utilize all cores on all nodes just to run a single request. I do believe that this use case is somewhat common, and likely to become more common.

dadoonet · May 7, 2015, 6:09am

I think you are looking for this: http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-shadow-replicas.html

David

pweaver · May 7, 2015, 7:15am

Wow, endless Googling did not find that article, thanks!

Topic		Replies	Views
Single NFS Storage for Entire Cluster - Separate processing and data replication Elasticsearch	2	4314	July 6, 2017
Can multiple ES instances share the same data directory? Elasticsearch	6	1647	November 4, 2022
Share data directory between exclusively running instances Elasticsearch	11	705	March 18, 2020
Replication of cluster data Elasticsearch	2	272	May 20, 2019
Shared Data-Store for multiple Nodes Elasticsearch	3	1920	March 17, 2020

Avoiding duplicate data and work when using a shared filesystem

Related topics