High CPU usage / High load

We see High cpu upto of 93% and load upto 11

Hot threads at 2020-05-13T09:33:20.955Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

87.5% (437.5ms out of 500ms) cpu usage by thread 'elasticsearch[.xxxxxxxxxxxx][[xxxx-20124063405-events-prod][0]: Lucene Merge Thread #680]'
2/10 snapshots sharing following 13 elements
app//org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter.startDoc(Lucene84PostingsWriter.java:267)
app//org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:148)
app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:865)
app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:344)
app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:245)
app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101)
app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
3/10 snapshots sharing following 16 elements
app//org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:212)
app//org.apache.lucene.codecs.lucene84.ForUtil.encode(ForUtil.java:291)
app//org.apache.lucene.codecs.lucene84.ForDeltaUtil.encodeDeltas(ForDeltaUtil.java:67)
app//org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter.startDoc(Lucene84PostingsWriter.java:251)
app//org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:148)
app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:865)
app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:344)
app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:245)
app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101)
app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
2/10 snapshots sharing following 12 elements
app//org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:148)
app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:865)
app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:344)
app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:245)
app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101)
app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
3/10 snapshots sharing following 11 elements
app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:865)
app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:344)
app//org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
app//org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197)
app//org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:245)
app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:101)
app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)

72.4% (362.2ms out of 500ms) cpu usage by thread 'elasticsearch[.xxxxxxxxxxxx][write][T#5]'
2/10 snapshots sharing following 183 elements
app//org.elasticsearch.index.translog.Translog.add(Translog.java:554)
app//org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:943)
app//org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:813)
app//org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:785)
app//org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:742)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:267)
app//org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:157)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:202)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:114)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:81)
app//org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:895)
app//org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:109)
app//org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:374)
app//org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:297)
app//org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction$$Lambda$5429/0x0000000801a73040.accept(Unknown Source)
app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)
app//org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$24(IndexShard.java:2791)
app//org.elasticsearch.index.shard.IndexShard$$Lambda$5392/0x0000000801a67040.accept(Unknown Source)
app//org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113)
app//org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285)
app//org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237)
app//org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2765)
app//org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:836)
app//org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:293)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
app//org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:256)
app//org.elasticsearch.action.support.replication.TransportReplicationAction$$Lambda$2983/0x0000000801502040.messageReceived(Unknown Source)
org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
app//org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:225)
org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHa

Those seem I/O related. What type of storage do you have? What does disk utilisation and iowait look like during these periods?

We are using SSD (EMC xtremio 0349).

This is current status
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 194.55 54.52 97.20 5562.15 11638.17 226.73 0.21 1.37 1.39 1.36 0.56 8.46
sdc 0.00 1.56 32.06 20.09 9835.81 1633.30 439.85 0.08 1.51 2.00 0.73 0.42 2.21
sda 0.00 0.86 4.10 1.50 96.37 17.95 40.84 0.00 0.58 0.64 0.44 0.32 0.18
dm-0 0.00 0.00 1.47 0.98 42.29 6.58 39.88 0.00 0.75 0.80 0.68 0.24 0.06
dm-1 0.00 0.00 0.01 0.00 0.20 0.00 47.83 0.00 5.46 0.20 122.50 2.31 0.00
dm-2 0.00 0.00 0.02 0.01 0.42 0.03 36.31 0.00 0.52 0.49 0.57 0.30 0.00
dm-3 0.00 0.00 2.03 0.43 32.90 6.46 31.96 0.00 0.51 0.50 0.54 0.34 0.08
dm-4 0.00 0.00 0.50 0.51 16.65 2.38 37.59 0.00 0.54 0.73 0.36 0.31 0.03
dm-5 0.00 0.00 0.05 0.43 2.75 2.48 21.88 0.00 0.30 0.68 0.25 0.20 0.01
dm-6 0.00 0.00 0.02 0.01 0.39 0.03 32.88 0.00 1.11 0.65 2.48 0.34 0.00
dm-7 0.00 0.00 2.35 1.60 65.98 7.85 37.35 0.00 0.80 0.75 0.88 0.28 0.11
dm-8 0.00 0.00 84.22 311.80 15331.43 13263.59 144.41 0.48 1.20 1.64 1.08 0.26 10.25

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 476.00 0.40 245.20 1.60 34659.20 282.25 0.33 1.36 0.00 1.37 0.80 19.76
sdc 0.00 0.00 0.00 11.80 0.00 238.40 40.41 0.01 0.49 0.00 0.49 0.44 0.52
sda 0.00 2.00 6.80 9.40 28.00 61.60 11.06 0.01 0.38 0.15 0.55 0.04 0.06
dm-0 0.00 0.00 6.80 1.00 28.00 4.00 8.21 0.00 0.15 0.18 0.00 0.05 0.04
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 9.60 0.00 52.00 10.83 0.01 0.56 0.00 0.56 0.02 0.02
dm-4 0.00 0.00 0.00 0.60 0.00 2.40 8.00 0.00 0.33 0.00 0.33 0.33 0.02
dm-5 0.00 0.00 0.00 0.20 0.00 3.20 32.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.00 0.00 0.80 0.00 3.20 8.00 0.00 1.00 0.00 1.00 0.50 0.04
dm-8 0.00 0.00 0.40 732.20 1.60 34894.40 95.27 0.90 1.23 0.00 1.23 0.27 19.90

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 608.58 0.60 402.99 2.40 97394.01 482.65 0.75 1.91 0.00 1.92 0.54 21.74
sdc 0.00 0.00 0.00 11.78 0.00 47.11 8.00 0.01 0.46 0.00 0.46 0.46 0.54
sda 0.00 0.40 0.00 5.19 0.00 23.15 8.92 0.00 0.04 0.00 0.04 0.04 0.02
dm-0 0.00 0.00 0.00 1.80 0.00 7.19 8.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 3.19 0.00 13.57 8.50 0.00 0.06 0.00 0.06 0.06 0.02
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.60 0.00 2.40 8.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.00 0.00 0.80 0.00 3.19 8.00 0.00 0.75 0.00 0.75 0.50 0.04
dm-8 0.00 0.00 0.60 1022.95 2.40 97452.30 190.42 1.35 1.32 0.00 1.32 0.21 21.96

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 563.27 0.20 240.72 0.80 19136.93 158.87 0.19 0.78 1.00 0.78 0.63 15.13
sdc 0.00 0.00 0.00 10.38 0.00 41.52 8.00 0.00 0.38 0.00 0.38 0.38 0.40
sda 0.00 0.20 0.00 1.40 0.00 24.75 35.43 0.00 0.29 0.00 0.29 0.29 0.04
dm-0 0.00 0.00 0.00 1.00 0.00 22.36 44.80 0.00 0.60 0.00 0.60 0.40 0.04
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.20 0.00 0.80 8.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.40 0.00 1.60 8.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.00 0.00 3.19 0.00 31.94 20.00 0.00 0.12 0.00 0.12 0.12 0.04
dm-8 0.00 0.00 0.20 810.78 0.80 19132.14 47.18 0.59 0.72 1.00 0.72 0.19 15.27

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.