Merge Thread busy even when refresh_interval disabled for indexing

lisak · August 19, 2015, 9:37am

Hi,

I'm performing bulk indexing in a scheduled manner and right before each one I disable refresh_interval to avoid throttling.

It used to work perfectly but then I upgraded to 1.6.0 (not 100% sure it caused it) and performance decreased from 4000 op/s to 250 op/s until Merge Thread finishes and then it returns back to 4000 op/s.

So my question is why I see Segment Merge Thread in _nodes/hot_threads keeping CPU busy even if I disabled refresh_interval for the index? It starts merging right after indexing and stops after 30 minutes if the bulk indexing doesn't stop sooner. Thanks !

I have 2 nodes (index has 5 shards) with one replica. Bulk sizes are optimal, index is not full, enough heap/off-heap RAM

24.6% (122.8ms out of 500ms) cpu usage by thread 'elasticsearch[Blacklash][[mi_2015_08_w4][0]: Lucene Merge Thread #86911]'
2/10 snapshots sharing following 19 elements
java.lang.Thread.sleep(Native Method)
java.lang.Thread.sleep(Thread.java:340)
org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:151)
org.apache.lucene.store.RateLimitedFSDirectory$RateLimiterWrapper.pause(RateLimitedFSDirectory.java:96)
org.apache.lucene.store.RateLimitedIndexOutput.checkRate(RateLimitedIndexOutput.java:76)
org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:70)
org.apache.lucene.codecs.lucene410.Lucene410DocValuesConsumer.addBinaryField(Lucene410DocValuesConsumer.java:284)
org.apache.lucene.codecs.lucene410.Lucene410DocValuesConsumer.addTermsDict(Lucene410DocValuesConsumer.java:336)
org.apache.lucene.codecs.lucene410.Lucene410DocValuesConsumer.addSortedField(Lucene410DocValuesConsumer.java:471)
org.apache.lucene.codecs.lucene410.Lucene410DocValuesConsumer.addSortedSetField(Lucene410DocValuesConsumer.java:500)
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedSetField(PerFieldDocValuesFormat.java:127)
org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:626)
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:213)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
2/10 snapshots sharing following 11 elements
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:51)
org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:71)
org.apache.lucene.store.CompoundFileWriter$DirectCFSIndexOutput.writeBytes(CompoundFileWriter.java:356)
org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:281)
org.apache.lucene.store.Directory.copy(Directory.java:194)
org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4785)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4266)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)

lisak · August 19, 2015, 11:04am

Now I'm even disabling throttling entirely before indexing :

"indices.store.throttle.type" : "none"

But I still see this in _nodes/hot_threads for the node. So neither disabling refresh_interval nor disabling throttling prevents ES from merging segments :-/

49.6% (248ms out of 500ms) cpu usage by thread 'elasticsearch[Blacklash][[mi_2015_08_w4][4]: Lucene Merge Thread #88922]'
10/10 snapshots sharing following 4 elements
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)

jpountz · August 21, 2015, 3:35pm

Indeed Elasticsearch needs to merge segments regardless of whether you refresh or not. When you add data to a Lucene index, it needs to invert the data (to build the inverted index) in memory. This in-memory buffer is merged to a segment not only when there is a refresh happening but also when this buffer hits a maximum size. And when there are many segments, elasticsearch/lucene need to merge them.

lisak · August 21, 2015, 3:57pm

Hi jpountz,

everything in my document type is not_analyzed with doc_values format. No full-text related things should happen.

The weird thing is that it has always the same running as you can see on the screen shot.

I'll have to give up on this, I've spent whole days to figure this out. Maybe it'll stop with ES 2.0

jpountz · August 21, 2015, 4:03pm

Actually, Elasticsearch also needs the indexing buffer for doc values, just for different reasons. It's actually "normal" that a node spends CPU time merging if there is steady indexing.

Earlier versions of elasticsearch had completely asynchronous indexing and refresh, which used to cause issues because the number of segments could grow unbounded so this was fixed in order to make sure that indexing keeps up with merges, I'm wondering that this might be the cause of the issue you are seeing. Which version did you upgrade from and how many fields do you have?

lisak · August 21, 2015, 4:29pm

We used 1.5.0 before. The document contains ~ 20 fields. 25 characters on average, one of them has 800 chars. I was playing with bulk sizes from 90 to 500 but it doesn't seem to have effect on this issue.

lisak · October 21, 2015, 5:47pm

Btw this documenting nicely what happens, merging is falling behind exactly for some time, then it all performs well :

15:05:25,915][INFO ][indices.store ] [es-884a70a1805147db95b8a28e776cd59a] updating indices.store.throttle.type from [merge] to [none]
15:05:25,966][INFO ][index.shard ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][4] updating refresh_interval from [1s] to [-1]
15:05:25,971][INFO ][index.shard ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][0] updating refresh_interval from [1s] to [-1]
15:05:25,974][INFO ][index.shard ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][2] updating refresh_interval from [1s] to [-1]
15:05:25,974][INFO ][index.shard ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][3] updating refresh_interval from [1s] to [-1]
15:05:25,974][INFO ][index.shard ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][1] updating refresh_interval from [1s] to [-1]
15:05:26,731][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][4] now throttling indexing: numMergesInFlight=4, maxNumMerges=3
15:05:26,807][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][0] now throttling indexing: numMergesInFlight=4, maxNumMerges=3
15:05:26,865][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][2] now throttling indexing: numMergesInFlight=4, maxNumMerges=3
15:05:26,922][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][4] stop throttling indexing: numMergesInFlight=2, maxNumMerges=3
15:05:26,979][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][3] now throttling indexing: numMergesInFlight=4, maxNumMerges=3
15:05:27,018][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][1] now throttling indexing: numMergesInFlight=4, maxNumMerges=3
15:05:27,060][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][2] stop throttling indexing: numMergesInFlight=2, maxNumMerges=3
15:05:27,067][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][0] stop throttling indexing: numMergesInFlight=2, maxNumMerges=3
15:05:27,139][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][3] stop throttling indexing: numMergesInFlight=2, maxNumMerges=3
15:05:27,330][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][1] stop throttling indexing: numMergesInFlight=2, maxNumMerges=3
15:05:29,024][INFO ][index.engine ] [es-884a70a1805147db95b8a28e776cd59a] [mi_2015_10_w4][3] now throttling indexing: numMergesInFlight=4, maxNumMerges=3

Lephix_Long · November 30, 2015, 5:36am

Now I am using ES 2.1. Still have the same problem. Indexing speed will drop down to hundreds per seconds and then back. But the strange thing is that it happens from one of the day, before it , I didn't observe anything like this.

boosty · December 29, 2015, 1:42pm

We are using 1.7 and noticed a similar problem: The indexing getting throttled even when the IO was not saturated (Fusion IO in our case).
After increasing "index.merge.scheduler.max_thread_count" the problem seems to be gone -- we did not see any more throttling being logged so far.

Topic		Replies	Views
High CPU usage because lucene merge thread Elasticsearch	6	1817	July 15, 2021
Throttle merges Elasticsearch	5	3998	July 5, 2017
Merge throttling is preventing heavy bulk indexing (ES 1.7.5) Elasticsearch	5	2001	July 5, 2017
Indexes are keep increasing after traffic towards elastic has stoped Elasticsearch	20	629	September 20, 2021
Merge Bottleneck Elasticsearch	2	485	July 6, 2017

Merge Thread busy even when refresh_interval disabled for indexing

Related topics