Share data directory between exclusively running instances

noelzubinvictor · February 18, 2020, 10:43am

Problem: The ES instance require lot of memory for quick bulk updates. Otherwise its underutilized.

Can i create 2 ES instances with the same shared data directory. Say one with 4Gb(searching) and the other with 32Gb(indexing). and only run one of them at a time.

I could conditionally turn on the 64Gb one during indexing and replace it with the 4Gb one once its done

Would this work ?

Ayush_Mathur · February 18, 2020, 1:12pm

The better option would be to configure hot-warm-cold architecture imho.

DavidTurner · February 18, 2020, 4:18pm

What you describe is effectively running a single node with two different configs, and you are effectively asking whether you can change the config of a node. Yes, you can do that.

noelzubinvictor · February 18, 2020, 4:20pm

Well more like running two nodes with the same data directory isn't it ?

DavidTurner · February 18, 2020, 4:24pm

Not really, no. If you only have one data directory then you only have one node, since a node is defined by the contents of its data directory. Everything else is config that is under your control.

noelzubinvictor · February 18, 2020, 4:26pm

Is it possible for 2 elasticsearch processes (running on different containers) to share the same data directory. Assuming only one them runs at a time.

noelzubinvictor · February 18, 2020, 4:31pm

Would having multiple nodes increase or decrease my indexing speed.

DavidTurner · February 18, 2020, 7:49pm

Yes that's possible, again because you only have one data directory, hence one node, and you're just changing where it runs.

It depends. Scaling out to multiple nodes is a good way to increase throughput, but there's overhead too.

noelzubinvictor · February 19, 2020, 5:34am

I will experiment again with other techniques and approaches to try increase my indexing speed. But if nothing else works. do you think the approach i mentioned above(shared data dir) would be a good solution for my quick indexing problem. Do you see any immediate issues that could happen with this approach.
I haven't seen this way mentioned in any of the blogs or tutorials.

DavidTurner · February 19, 2020, 8:14am

IMO the main drawback is just the downtime: it's not normally acceptable to restart things in between the indexing and searching phases, because people mostly want to search the data as it arrives. Furthermore if you want to do some more indexing then you must restart the node again, breaking any ongoing searches.

It's more usual to have a hot/warm setup, often with ILM to move indices from the hot tier to the warm tier and do various other optimisations like force-merging them at the same time. That way you can scale the hot (indexing) and warm (search) tiers independently and can scale the hot tier down (all the way to zero if needed) when there's no indexing taking place, without affecting the warm tier.

noelzubinvictor · February 19, 2020, 8:46am

thanks man. Will check them out.

system · March 18, 2020, 8:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can multiple ES instances share the same data directory? Elasticsearch	6	1635	November 4, 2022
Should Data Nodes still be the same size? Elasticsearch	3	1018	March 23, 2022
How to create multiple nodes in elasticsearch Elasticsearch	6	2332	September 15, 2017
Two nodes on one physical machine Elasticsearch	4	524	February 4, 2017
Multiple data directories ->parallel search of shards on same instance? Elasticsearch	6	3400	July 5, 2017

Share data directory between exclusively running instances

Related topics