We are importing data to elasticsearch cluster in few indices, around
At the same time, we care about search on existing indices, few of them are small-
~100mb, few of them are big-
In order to optimize indexing, we:
bulk api with optimized bulk size;
- set refresh interval to
- set replication factor to
Now, we are trying to understand how merge throttling can help. How search and segment merging are related, if search only against existing indices?
According to this article, we can disable merge throttling.
- Does that mean merges will "eat" disks i/o?
- Does that mean merges won't happen at all and we have to
_forcemerge manually, after indexing is done? Should be worried about max open file descriptors in such case?
According to these article and pull request we shouldn't touch merging settings at all.
Very confused here, any help is highly appreciated.
Don't worry about it, let ES handle the merging automatically
Your initial 3 steps are all you need to do!
@warkolm would be grateful, if you can add more details and answer 2 questions above. I want to understand how does it work and what actually happens.
The best answer to those is don't disable merging as I mentioned.
Otherwise yes merges use IO, if you disable them then they won't happen and a force merge is required.
@warkolm, sorry for molestation, documentation is really poor regarding this internal logic.
As far as I understood, importing data at certain rate might cause merging processes 'eat' all available disk i/o.
In order to keep some room for search queries, there is configuration
indices.store.throttle.max_bytes_per_sec that throttles indexing threads if merging rate is higher than this number.
Using configuration option
indices.store.throttle.type we can disable/enable index throttling.
merge throttling actually means
See pr here and qbox article here.
I thought if merges won't happen, it might across max open file descriptors number in OS, if index is huge.
Are you indexing into these indices continuously or doing bulk inserts/updates periodically?
Periodically, daily basis, new indices every day.
Since ES 2.x, the IO throttling is handled automatically by Lucene, meaning it starts at 20 MB/sec throttle on writing bytes to the merged segment. It then increases that rate when merges fall behind, and decreases it otherwise. This means the merges, over time, only soak up as much IO bandwidth as is needed to keep up with your rate of indexing.
You don't need to
forceMerge yourself: the merges will happen naturally as you are indexing.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.