Lucene Merge Thread causing high CPU usage

Hi all,

I have been facing issue with elasticsearch consuming a very high CPU i.e., 80-100% when a heavy data load is ingested.

Below are few inputs:-

Elasticsearch is running on. 7.10.2 version
We have data tier architecture in place with 12 hot nodes and the data is ingested to elasticsearch with data streams. As the data is only logs data.

Elasticsearch indexing rate is close to 231k docs/sec
CPU usage is > 90%
Memory usage is below 50%

Output from hot_threads API

 101.7% (508.5ms out of 500ms) cpu usage by thread 'elasticsearch[elasticstackstage-hot-v004-v4tp][[.ds-carbonelastic-000155][2]: Lucene Merge Thread #130]'
     10/10 snapshots sharing following 12 elements
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)

There are several nodes with similar usage.

Lucene merge threads seem to be causing the CPU spikes.

Looking for some help on this.

What type of storage are you using on these nodes? SSDs?

What does disk I/O and iowait look like?

@Christian_Dahlqvist yes, i am using SSDs.

The i/o operations were looking good during the Spike. I have attached a screencap for your reference.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.