Increase in size of refresh queue size, increasing the write thread pool and queue on multiple data nodes

I am having multiple data nodes handling a large logging application elasticsearch running on 7.15.0 version. We are experiencing a massive increase in refresh thread pool and I am still unaware on why the same is happening. Due to this write thread pool and queue count is increase. Attaching the output for hot/threads :slight_smile:

::: {ip-10-215-67-164}{iOdjKmTKRjaRubUq5zFl6w}{dDce_9hSRZ6D4u5ZvIGtOQ}{10.215.67.164}{10.215.67.164:9300}{cdfhlstw}{ml.machine_memory=66926301184, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=34359738368, data=small, transform.node=true}
   Hot threads at 2022-12-07T15:46:34.241Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {ip-10-215-67-92}{-D6IgDBgQvCbt5k9h7zOXg}{9rcmKXJJQ6KjqDqVoJwItQ}{10.215.67.92}{10.215.67.92:9300}{cdfhlstw}{ml.machine_memory=66909523968, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=33269219328, data=small, transform.node=true}
   Hot threads at 2022-12-07T15:46:34.241Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   50.6% (252.9ms out of 500ms) cpu usage by thread 'elasticsearch[ip-10-215-67-92][write][T#3]'
     2/10 snapshots sharing following 19 elements
       app//org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:465)
       app//org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:140)
       app//org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:89)
       app//org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:83)
       app//org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:887)
       app//org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:855)
       app//org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:827)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:268)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:158)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:203)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:109)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:74)
       app//org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:172)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@16.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.base@16.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
       java.base@16.0.2/java.lang.Thread.run(Thread.java:831)
     3/10 snapshots sharing following 24 elements
       app//org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:488)
       app//org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208)
       app//org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
       app//org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471)
       app//org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1757)
       app//org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1400)
       app//org.elasticsearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1196)
       app//org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1133)
       app//org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:960)
       app//org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:909)
       app//org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:870)
       app//org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:827)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:268)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:158)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:203)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:109)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:74)
       app//org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:172)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@16.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.base@16.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
       java.base@16.0.2/java.lang.Thread.run(Thread.java:831)
     4/10 snapshots sharing following 15 elements
       app//org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:909)
       app//org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:870)
       app//org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:827)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:268)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:158)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:203)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:109)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:74)
       app//org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:172)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@16.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.base@16.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
       java.base@16.0.2/java.lang.Thread.run(Thread.java:831)
     unique snapshot
       java.base@16.0.2/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
       java.base@16.0.2/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:82)
       java.base@16.0.2/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:465)
       app//org.elasticsearch.index.translog.TranslogWriter.syncUpTo(TranslogWriter.java:417)
       app//org.elasticsearch.index.translog.Translog.ensureSynced(Translog.java:776)
       app//org.elasticsearch.index.translog.Translog.ensureSynced(Translog.java:797)
       app//org.elasticsearch.index.engine.InternalEngine.ensureTranslogSynced(InternalEngine.java:539)
       app//org.elasticsearch.index.shard.IndexShard$7.write(IndexShard.java:3329)
       app//org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:97)
       app//org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:85)
       app//org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:73)
       app//org.elasticsearch.index.shard.IndexShard.sync(IndexShard.java:3352)
       app//org.elasticsearch.action.support.replication.TransportWriteAction$AsyncAfterWriteAction.run(TransportWriteAction.java:410)
       app//org.elasticsearch.action.support.replication.TransportWriteAction$WritePrimaryResult.runPostReplicationActions(TransportWriteAction.java:253)
       app//org.elasticsearch.action.support.replication.ReplicationOperation.handlePrimaryResult(ReplicationOperation.java:139)
       app//org.elasticsearch.action.support.replication.ReplicationOperation$$Lambda$5798/0x00000008019d0d20.accept(Unknown Source)
       app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:134)
       app//org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:445)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction$2.finishRequest(TransportShardBulkAction.java:198)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:167)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:203)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:109)
       app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:74)
       app//org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:172)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@16.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.base@16.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
       java.base@16.0.2/java.lang.Thread.run(Thread.java:831)
   
   18.7% (93.5ms out of 500ms) cpu usage by thread 'elasticsearch[ip-10-215-67-92][[pg-merchant-contracts-2022.12.07][3]: Lucene Merge Thread #4460]'
     10/10 snapshots sharing following 19 elements
       java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method)
       java.base@16.0.2/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
       java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1661)
       app//org.apache.lucene.index.MergePolicy$OneMergeProgress.pauseNanos(MergePolicy.java:164)
       app//org.apache.lucene.index.MergeRateLimiter.maybePause(MergeRateLimiter.java:148)
       app//org.apache.lucene.index.MergeRateLimiter.pause(MergeRateLimiter.java:93)
       app//org.apache.lucene.store.RateLimitedIndexOutput.checkRate(RateLimitedIndexOutput.java:78)
       app//org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:72)
       app//org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:278)
       app//org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.copyChunks(CompressingStoredFieldsWriter.java:583)
       app//org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:633)
       app//org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)

::: {ip-10-215-66-110}{lq1lx1edSC6uAk87w2gBAQ}{uLBaoAbETxmIGa7gpM3siw}{10.215.66.110}{10.215.66.110:9300}{cdfhlstw}{ml.machine_memory=66271203328, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=33470545920, data=large, transform.node=true}
   Hot threads at 2022-12-07T15:46:34.241Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   69.1% (345.5ms out of 500ms) cpu usage by thread 'elasticsearch[ip-10-215-66-110][[payment-adapter-2022.12.07][18]: Lucene Merge Thread #9962]'
     3/10 snapshots sharing following 19 elements
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$21.ordValue(Lucene80DocValuesProducer.java:988)
       app//org.apache.lucene.index.SingletonSortedSetDocValues.nextDoc(SingletonSortedSetDocValues.java:66)
       app//org.apache.lucene.codecs.DocValuesConsumer$SortedSetDocValuesSub.nextDoc(DocValuesConsumer.java:742)
       app//org.apache.lucene.index.DocIDMerger$SequentialDocIDMerger.next(DocIDMerger.java:99)
       app//org.apache.lucene.codecs.DocValuesConsumer$5$1.nextDoc(DocValuesConsumer.java:848)
       app//org.apache.lucene.search.SortedSetSelector$MinValue.nextDoc(SortedSetSelector.java:108)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.doAddSortedField(Lucene80DocValuesConsumer.java:639)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addSortedSetField(Lucene80DocValuesConsumer.java:891)
       app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:804)
       app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:145)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     4/10 snapshots sharing following 18 elements
       app//org.apache.lucene.index.SingletonSortedSetDocValues.nextDoc(SingletonSortedSetDocValues.java:64)
       app//org.apache.lucene.codecs.DocValuesConsumer$SortedSetDocValuesSub.nextDoc(DocValuesConsumer.java:742)
       app//org.apache.lucene.index.DocIDMerger$SequentialDocIDMerger.next(DocIDMerger.java:99)
       app//org.apache.lucene.codecs.DocValuesConsumer$5$1.nextDoc(DocValuesConsumer.java:848)
       app//org.apache.lucene.search.SortedSetSelector$MinValue.nextDoc(SortedSetSelector.java:108)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.doAddSortedField(Lucene80DocValuesConsumer.java:639)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addSortedSetField(Lucene80DocValuesConsumer.java:891)
       app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:804)
       app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:145)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     3/10 snapshots sharing following 13 elements
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.doAddSortedField(Lucene80DocValuesConsumer.java:639)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addSortedSetField(Lucene80DocValuesConsumer.java:891)
       app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:804)
       app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:145)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
   
   45.9% (229.5ms out of 500ms) cpu usage by thread 'elasticsearch[ip-10-215-66-110][[pg-limitcentercore-2022.12.07][8]: Lucene Merge Thread #9637]'
     6/10 snapshots sharing following 24 elements
       java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method)
       java.base@16.0.2/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
       java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1661)
       app//org.apache.lucene.index.MergePolicy$OneMergeProgress.pauseNanos(MergePolicy.java:164)
       app//org.apache.lucene.index.MergeRateLimiter.maybePause(MergeRateLimiter.java:148)
       app//org.apache.lucene.index.MergeRateLimiter.pause(MergeRateLimiter.java:93)
       app//org.apache.lucene.store.RateLimitedIndexOutput.checkRate(RateLimitedIndexOutput.java:78)
       app//org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:72)
       app//org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:52)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:846)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:606)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     4/10 snapshots sharing following 11 elements
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)

::: {ip-10-215-67-240}{2RWe275qTzqTaw-LdfNy4Q}{caTQiNmhREGTRhdc9NbvXg}{10.215.67.240}{10.215.67.240:9300}{cdfhlstw}{ml.machine_memory=66271203328, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=34359738368, data=large, transform.node=true}
   Hot threads at 2022-12-07T15:46:34.242Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   91.8% (459.1ms out of 500ms) cpu usage by thread 'elasticsearch[ip-10-215-67-240][[pg-merchant-center-2022.12.07][9]: Lucene Merge Thread #10484]'
     5/10 snapshots sharing following 17 elements
       app//org.apache.lucene.codecs.lucene80.IndexedDISI$Method$1.advanceExactWithinBlock(IndexedDISI.java:507)
       app//org.apache.lucene.codecs.lucene80.IndexedDISI.advanceExact(IndexedDISI.java:399)
       app//org.apache.lucene.codecs.lucene80.Lucene80NormsProducer$SparseNormsIterator.advanceExact(Lucene80NormsProducer.java:186)
       app//org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter.startDoc(Lucene84PostingsWriter.java:259)
       app//org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:146)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     2/10 snapshots sharing following 12 elements
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     2/10 snapshots sharing following 14 elements
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     unique snapshot
       app//org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:202)
       app//org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:279)
       app//org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:301)
       app//org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.next(FilterLeafReader.java:194)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:310)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
   
   66.7% (333.4ms out of 500ms) cpu usage by thread 'elasticsearch[ip-10-215-67-240][[pg-checkout-2022.12.07][6]: Lucene Merge Thread #566]'
     3/10 snapshots sharing following 12 elements
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     4/10 snapshots sharing following 14 elements
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     2/10 snapshots sharing following 13 elements
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     unique snapshot
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
       app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
       app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
       app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)

My guess is that you have slow storage and are overloading it. What kind of storage do you have? What does iostat show?

@Christian_Dahlqvist I am using gp3 volume with assigned 3000IOPS for each volume. It is on aws

What does iostat -x show (am assuming you are running on Linux) when these messages occur?

@Christian_Dahlqvist sorry for the delay, had to fetch the output during peak TPS.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          13.83    0.00    0.52    0.20    0.00   85.45

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.08    0.36    1.14    11.67     7.38    25.29     0.00    0.41    0.56    0.36   0.03   0.00
nvme1n1           0.00     4.37   88.97  172.06  6449.05 22837.38   224.40     0.40    0.10    0.98    2.41   0.18   4.66


For one of the data node

This issue was resolved once the IOPS for the cluster was increased.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.