Avoiding duplicate data and work when using a shared filesystem

(Philip Coit Weaver) #1

If two elasticsearch nodes share the same filesystem, then is there any way to prevent them from pointlessly copying data from one node to the other (e.g. don't actually copy anything when rebalancing), or duplicating the data unnecessarily (e.g. when using replicas)? Ideally all nodes would just have read access to all shards, and the number_of_replicas setting would be irrelevant. I'm interested in both practical and hypothetical solutions.

Although my interest is in running elasticsearch on NFS, this question also applies to when you run more than one instance of elasticsearch on the same machine.

I am aware of the performance considerations when running on a network filesystem. However, for my particular use case, I believe we are CPU bound because we do a lot of aggregations and fully utilize all cores on all nodes just to run a single request. I do believe that this use case is somewhat common, and likely to become more common.

(David Pilato) #2

I think you are looking for this: http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-shadow-replicas.html


(Philip Coit Weaver) #3

Wow, endless Googling did not find that article, thanks!

(system) #5