High CPU Usage across entire cluster

We have a version 5.6 100 plus node cluster which has been suffering from high cpu usage for a sometime now. The hot threads show several different threads which are usually taking 100% cpu.
3 masters 6i.xlarge
38 nodes i3.2xlarge zone 1
38 nodes r5.2xlarge zone 2
38 nodes i3.2xlarge zone 3

The hot threads which are showing up with hot threads are Lucene Merge, warmer, management and bulk threads
Can someone help shed some light on this issues?

   100.2% (501ms out of 500ms) cpu usage by thread 'elasticsearch[es-zone2-ebs-23][[account_1094_202112031094][353]: Lucene Merge Thread #15]'
     2/10 snapshots sharing following 19 elements
       org.apache.lucene.codecs.DocValuesConsumer$4$1.hasNext(DocValuesConsumer.java:545)
       org.apache.lucene.codecs.DocValuesConsumer$4$1.next(DocValuesConsumer.java:555)
       org.apache.lucene.codecs.DocValuesConsumer$4$1.next(DocValuesConsumer.java:536)
       org.apache.lucene.codecs.DocValuesConsumer$10$1.next(DocValuesConsumer.java:1028)
       org.apache.lucene.codecs.DocValuesConsumer$10$1.next(DocValuesConsumer.java:1015)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.writeSparseMissingBitset(Lucene54DocValuesConsumer.java:332)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:207)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
     8/10 snapshots sharing following 12 elements
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)

::: {es-zone3-9}{2Z94hNQfTR-14lzWMzK8sA}{h3jNlnniS8SsyBEanRIqFw}{10.0.204.76}{10.0.204.76:9300}{ml.max_open_jobs=10, rack_id=zone3, ml.enabled=true}
   Hot threads at 2022-03-12T00:56:17.669Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   100.2% (501.1ms out of 500ms) cpu usage by thread 'elasticsearch[es-zone3-9][[account_1094_202112031094][184]: Lucene Merge Thread #3]'
     2/10 snapshots sharing following 17 elements
       org.apache.lucene.index.SingletonSortedNumericDocValues.setDocument(SingletonSortedNumericDocValues.java:52)
       org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc(DocValuesConsumer.java:449)
       org.apache.lucene.index.DocIDMerger$SequentialDocIDMerger.next(DocIDMerger.java:100)
       org.apache.lucene.codecs.DocValuesConsumer$3$1.setNext(DocValuesConsumer.java:511)
       org.apache.lucene.codecs.DocValuesConsumer$3$1.hasNext(DocValuesConsumer.java:491)
       org.apache.lucene.codecs.DocValuesConsumer.isSingleValued(DocValuesConsumer.java:998)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:586)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
     2/10 snapshots sharing following 14 elements
       org.apache.lucene.codecs.DocValuesConsumer$3$1.setNext(DocValuesConsumer.java:511)
       org.apache.lucene.codecs.DocValuesConsumer$3$1.hasNext(DocValuesConsumer.java:491)
       org.apache.lucene.codecs.DocValuesConsumer.isSingleValued(DocValuesConsumer.java:998)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:586)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
     2/10 snapshots sharing following 14 elements
       org.apache.lucene.codecs.DocValuesConsumer$10$1.next(DocValuesConsumer.java:1015)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:105)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
     2/10 snapshots sharing following 19 elements
       org.apache.lucene.index.SingletonSortedNumericDocValues.setDocument(SingletonSortedNumericDocValues.java:52)
       org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc(DocValuesConsumer.java:449)
       org.apache.lucene.index.DocIDMerger$SequentialDocIDMerger.next(DocIDMerger.java:100)
       org.apache.lucene.codecs.DocValuesConsumer$3$1.setNext(DocValuesConsumer.java:511)
       org.apache.lucene.codecs.DocValuesConsumer$3$1.hasNext(DocValuesConsumer.java:491)
       org.apache.lucene.codecs.DocValuesConsumer$10$1.hasNext(DocValuesConsumer.java:1019)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:105)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
     2/10 snapshots sharing following 12 elements
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
::: {es-zone1-10}{8TC2WC8XRjqgXGB7O23scA}{w6-91TEJR7CKhrjg2jPdmA}{10.0.203.151}{10.0.203.151:9300}{ml.max_open_jobs=10, rack_id=zone1, ml.enabled=true}
   Hot threads at 2022-03-12T00:56:17.668Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   100.4% (501.9ms out of 500ms) cpu usage by thread 'elasticsearch[es-zone1-10][warmer][T#8]'
     5/10 snapshots sharing following 22 elements
       org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:284)
       org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:211)
       org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:279)
       org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:301)
       org.apache.lucene.index.MultiDocValues$OrdinalMap.<init>(MultiDocValues.java:554)
       org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:511)
       org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:490)
       org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build(GlobalOrdinalsBuilder.java:65)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:130)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:47)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$1(IndicesFieldDataCache.java:157)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$$Lambda$2203/456155471.load(Unknown Source)
       org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:399)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:154)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.loadGlobal(SortedSetDVOrdinalsIndexFieldData.java:118)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.loadGlobal(SortedSetDVOrdinalsIndexFieldData.java:47)
       org.elasticsearch.index.IndexWarmer$FieldDataWarmer.lambda$warmReader$1(IndexWarmer.java:141)
       org.elasticsearch.index.IndexWarmer$FieldDataWarmer$$Lambda$2196/390567367.run(Unknown Source)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     2/10 snapshots sharing following 21 elements
       org.apache.lucene.index.MultiTermsEnum$TermMergeQueue.fillTop(MultiTermsEnum.java:429)
       org.apache.lucene.index.MultiTermsEnum.pullTop(MultiTermsEnum.java:267)
       org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:305)
       org.apache.lucene.index.MultiDocValues$OrdinalMap.<init>(MultiDocValues.java:554)
       org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:511)
       org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:490)
       org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build(GlobalOrdinalsBuilder.java:65)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:130)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:47)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$1(IndicesFieldDataCache.java:157)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$$Lambda$2203/456155471.load(Unknown Source)
       org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:399)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:154)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.loadGlobal(SortedSetDVOrdinalsIndexFieldData.java:118)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.loadGlobal(SortedSetDVOrdinalsIndexFieldData.java:47)
       org.elasticsearch.index.IndexWarmer$FieldDataWarmer.lambda$warmReader$1(IndexWarmer.java:141)
       org.elasticsearch.index.IndexWarmer$FieldDataWarmer$$Lambda$2196/390567367.run(Unknown Source)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     3/10 snapshots sharing following 17 elements
       org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:511)
       org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:490)
       org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build(GlobalOrdinalsBuilder.java:65)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:130)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:47)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$1(IndicesFieldDataCache.java:157)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$$Lambda$2203/456155471.load(Unknown Source)
       org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:399)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:154)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.loadGlobal(SortedSetDVOrdinalsIndexFieldData.java:118)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.loadGlobal(SortedSetDVOrdinalsIndexFieldData.java:47)
       org.elasticsearch.index.IndexWarmer$FieldDataWarmer.lambda$warmReader$1(IndexWarmer.java:141)
       org.elasticsearch.index.IndexWarmer$FieldDataWarmer$$Lambda$2196/390567367.run(Unknown Source)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)

   100.4% (501.9ms out of 500ms) cpu usage by thread 'elasticsearch[es-zone1-10][[account_1094_202112031094][271]: Lucene Merge Thread #14]'
     2/10 snapshots sharing following 14 elements
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.writeSparseMissingBitset(Lucene54DocValuesConsumer.java:332)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:207)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
     8/10 snapshots sharing following 16 elements
       java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301)
       java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:105)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:296)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)

   100.4% (501.7ms out of 500ms) cpu usage by thread 'elasticsearch[es-zone1-10][[account_1094_202112031094][167]: Lucene Merge Thread #3]'
     5/10 snapshots sharing following 18 elements
       java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207)
       java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162)
       java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301)
       java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:105)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:296)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
     5/10 snapshots sharing following 13 elements
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:296)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:89)
       org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:589)
       org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
       org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
       org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
       org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
       org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
       org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
       org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)


5.6 version is already EOL and you should try to upgrade your clusters to the latest version.

Regarding high cpu utilisation , looks like your cluster has index heavy use case.

Knowing the throughout (writes per second) will help answering further.

Few other points:
How many batches do you run in parallel?
How many docs are indexed per batch?
Are there any aggregation/sort queries being used in your cluster?
Is there any disk io wait happening ?
What kind of storage you have ?

I'll answer what I can for now and will have to get the rest on Monday, I'm still new to these clusters and not fully sure of the details on how data is indexed into the clusters here. So yes we tried to upgrade to v7 but this proved to be problematic because the queries returned data much slower in v7 than in v5 partly due to changes in later versions, we use parent child relationships in v5 which was changed in later version and we think this change is part of the issue we have yet to resolve for v7.

The disk iowait does show some latency at times but doesn't seem to be extreme, I will need to get a more accurate look when the cluster in under load Monday.

We have 1.6 tb of storage per node of SSD's
2/3'spf the disk are nvme 76 nodes
1/3 of the disk are ebs 38 nodes
Our nodes are 8 vcpus per node

I will answer the others once I can get that info on Monday

Got a faster response than expected on part of it
as a note our use case is for search but we do index customer data continually
our index names do not change so we are not doing any rollover or time based indexes but we do reindexing periodically to cull data from the index, we retain 18 months of data for our customers, each customer has their own index
We have 80 bulk index threads with up to 10k docs per batch

Amount of queries that seems to be run ranges from a few thousand to 9 plus million total queries run on the cluster in any 20 minute period, and at times we can see one node will run a lot of queries 60k in a 10 minute period which is much more than any other node

Is this behavior noticed all time or only doing periodic writes. I would suggest you throtle your writes by decreasing the batch size from 10K to 5K and gradually increasing to figure out what your clusters can process without CPU crossing 85%.

Thank you I was just thinking of something along this these lines also need to determine if the 80 threads are for a single index or all indexes

Yes , Good to check on the thread too!

So a bit of additional information we when this started to happen was in relation to an index that we started reindexing at 400 shards no replicas during the reindexing and replicas and refresh rate are disabled during the reindexing. Also once we finish the reindexing we rest the refresh rate to 1s and the replicas to two, but even before we do this we see the degradation in the server

basically it looks like merges is slowing down the cluster we have tried to set the refresh rate for that one index to 30s or higher to see if this will help with recovery of the cluster

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.