Share data directory between exclusively running instances

Problem: The ES instance require lot of memory for quick bulk updates. Otherwise its underutilized.

Can i create 2 ES instances with the same shared data directory. Say one with 4Gb(searching) and the other with 32Gb(indexing). and only run one of them at a time.

I could conditionally turn on the 64Gb one during indexing and replace it with the 4Gb one once its done

Would this work ?

The better option would be to configure hot-warm-cold architecture imho.

1 Like

What you describe is effectively running a single node with two different configs, and you are effectively asking whether you can change the config of a node. Yes, you can do that.

Well more like running two nodes with the same data directory isn't it ?

Not really, no. If you only have one data directory then you only have one node, since a node is defined by the contents of its data directory. Everything else is config that is under your control.

Is it possible for 2 elasticsearch processes (running on different containers) to share the same data directory. Assuming only one them runs at a time.

Would having multiple nodes increase or decrease my indexing speed.

Yes that's possible, again because you only have one data directory, hence one node, and you're just changing where it runs.

It depends. Scaling out to multiple nodes is a good way to increase throughput, but there's overhead too.

1 Like

I will experiment again with other techniques and approaches to try increase my indexing speed. But if nothing else works. do you think the approach i mentioned above(shared data dir) would be a good solution for my quick indexing problem. Do you see any immediate issues that could happen with this approach.
I haven't seen this way mentioned in any of the blogs or tutorials.

IMO the main drawback is just the downtime: it's not normally acceptable to restart things in between the indexing and searching phases, because people mostly want to search the data as it arrives. Furthermore if you want to do some more indexing then you must restart the node again, breaking any ongoing searches.

It's more usual to have a hot/warm setup, often with ILM to move indices from the hot tier to the warm tier and do various other optimisations like force-merging them at the same time. That way you can scale the hot (indexing) and warm (search) tiers independently and can scale the hot tier down (all the way to zero if needed) when there's no indexing taking place, without affecting the warm tier.

3 Likes

thanks man. Will check them out.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.