Elasticsearch indexing performance: throttle merging

antonbormotov · January 13, 2017, 5:14pm

We are importing data to elasticsearch cluster in few indices, around ~10gb each.
At the same time, we care about search on existing indices, few of them are small-~100mb, few of them are big-~10gb.

In order to optimize indexing, we:

use bulk api with optimized bulk size;
set refresh interval to -1;
set replication factor to 0;

Now, we are trying to understand how merge throttling can help. How search and segment merging are related, if search only against existing indices?

According to this article, we can disable merge throttling.

Does that mean merges will "eat" disks i/o?
Does that mean merges won't happen at all and we have to _forcemerge manually, after indexing is done? Should be worried about max open file descriptors in such case?

According to these article and pull request we shouldn't touch merging settings at all.

Very confused here, any help is highly appreciated.

warkolm · January 16, 2017, 12:39am

Don't worry about it, let ES handle the merging automatically

Your initial 3 steps are all you need to do!

antonbormotov · January 16, 2017, 2:29am

@warkolm would be grateful, if you can add more details and answer 2 questions above. I want to understand how does it work and what actually happens.

warkolm · January 16, 2017, 2:47am

Which two questions?

antonbormotov · January 16, 2017, 2:50am

These two, Mark.

warkolm · January 16, 2017, 2:54am

The best answer to those is don't disable merging as I mentioned.

Otherwise yes merges use IO, if you disable them then they won't happen and a force merge is required.

antonbormotov · January 16, 2017, 6:47am

@warkolm, sorry for molestation, documentation is really poor regarding this internal logic.
As far as I understood, importing data at certain rate might cause merging processes 'eat' all available disk i/o.
In order to keep some room for search queries, there is configuration indices.store.throttle.max_bytes_per_sec that throttles indexing threads if merging rate is higher than this number.

Using configuration option indices.store.throttle.type we can disable/enable index throttling.
Looks like merge throttling actually means index throttling.
See pr here and qbox article here.

I thought if merges won't happen, it might across max open file descriptors number in OS, if index is huge.

Christian_Dahlqvist · January 16, 2017, 8:15am

Are you indexing into these indices continuously or doing bulk inserts/updates periodically?

antonbormotov · January 16, 2017, 11:01am

Periodically, daily basis, new indices every day.

mikemccand · January 16, 2017, 12:23pm

Since ES 2.x, the IO throttling is handled automatically by Lucene, meaning it starts at 20 MB/sec throttle on writing bytes to the merged segment. It then increases that rate when merges fall behind, and decreases it otherwise. This means the merges, over time, only soak up as much IO bandwidth as is needed to keep up with your rate of indexing.

You don't need to forceMerge yourself: the merges will happen naturally as you are indexing.

Mike McCandless

system · February 13, 2017, 12:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Merging and indices.store.throttle.max_bytes_per_sec Elasticsearch	1	742	November 27, 2017
Indices.store.throttle.max_bytes_per_sec config setting and 2.2 Elasticsearch	3	2269	July 5, 2017
Optimize elasticsearch segment merge Elasticsearch	5	1628	May 1, 2020
Force Merge Segments - Avoid Throttling Elasticsearch	3	485	May 20, 2019
Merge throttling is preventing heavy bulk indexing (ES 1.7.5) Elasticsearch	5	1930	July 5, 2017

Elasticsearch indexing performance: throttle merging

Related topics