Hello,
I see many errors in hot thread API.
Overview
Version: 5.5.1
Uptime: 13 days
Nodes: 60
Disk Available: 247TB / 413TB (59.74%)
JVM Heap: 68.38% (1TB / 2TB)
Indices: 2,857
Documents: 24,414,223,409
Disk Usage: 154TB
Primary Shards: 24,039
Replica Shards: 25,139
Too many many errors on many org.elasticsearch.xpack.monitoring.exporter.ExportException too in master log.
My cluster seems busy ? but i don't understand why.
::: {opbdf1019_data_02}{050boVO5RsaA6oVA4psUzA}{fjEK3eFjQ_yJ-CK4fAOUmA}{10.79.18.163} {10.79.18.163:9302}{rack_id=BB_Prod05}
Hot threads at 2020-10-14T16:09:35.356Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
4.4% (21.9ms out of 500ms) cpu usage by thread 'elasticsearch[opbdf1019_data_02][bulk][T#27]'
4/10 snapshots sharing following 28 elements
org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:373)
org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:93)
org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:66)
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:277)
org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:529)
org.elasticsearch.index.shard.IndexShard.prepareIndexOnReplica(IndexShard.java:518)
org.elasticsearch.index.shard.IndexShard.acquireReplicaOperationLock(IndexShard.java:1673)
org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:566)org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
3/10 snapshots sharing following 33 elements
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:447)
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:403)
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478)
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1571)
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1316)
org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:663)
org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:607)
org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:505)
org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:556)
org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:545)
org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnReplica(Transpororg.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:147)
org.elasticsearch.index.shard.IndexShard.acquireReplicaOperationLock(IndexShard.java:1673)
org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:566)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:451)
org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:441)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
3/10 snapshots sharing following 21 elements
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:376)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:69)
org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:494)
org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:467)
org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:147)
org.elasticsearch.index.shard.IndexShard.acquireReplicaOperationLock(IndexShard.java:1673)
org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:566)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:451)
org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:441)
com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceivedDecorate(SearchGuardSSLRequestHandler.java:178)
com.floragunn.searchguard.transport.SearchGuardRequestHandler.messageReceivedDecorate(SearchGuardRequestHandler.java:192)
com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceived(SearchGuardSSLRequestHandler.java:140)
com.floragunn.searchguard.SearchGuardPlugin$3$1.messageReceived(SearchGuardPlugin.java:376)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2.9% (14.6ms out of 500ms) cpu usage by thread 'elasticsearch[opbdf1019_data_02][[z_app_2ip_es_socle_cdr-20201014][13]: Lucene Merge Thread #875]'
10/10 snapshots sharing following 17 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
org.apache.lucene.index.MergePolicy$OneMergeProgress.pauseNanos(MergePolicy.java:150)
org.apache.lucene.index.MergeRateLimiter.maybePause(MergeRateLimiter.java:148)
org.apache.lucene.index.MergeRateLimiter.pause(MergeRateLimiter.java:93)
org.apache.lucene.store.RateLimitedIndexOutput.checkRate(RateLimitedIndexOutput.java:78)
org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:72)
org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:278)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:620)
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:200)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:89)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)
::: {opbdf1717_data_04}{Eh_JC8TfQgyfiXaxz0jQzg}{oBDFBUqeS3aAECc6Tr-7Ag}{10.79.20.17}{10.79.20.17:9304}{rack_id=BB_Prod09}
Hot threads at 2020-10-14T16:09:35.355Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
::: {opbdf1219_data_03}{cuwbNmH6TYu_Drc3ZwgoDg}{3cg6CfTHR3q7JFpBrqhFrQ}{10.79.18.125}{10.79.18.125:9303}{rack_id=BB_Prod06}
Hot threads at 2020-10-14T16:09:35.362Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
96.8% (483.9ms out of 500ms) cpu usage by thread 'elasticsearch[opbdf1219_data_03][management][T#5]'
10/10 snapshots sharing following 17 element
::: {opbdf1118_master-adm_90}{VYCNOoLwQBK4r-I6OKftaw}{DxVf4OezRauzYFwGq-zl8g}{10.79.18.241}{10.79.18.241:9390}{rack_id=BB_Prod06}
Hot threads at 2020-10-14T16:09:35.354Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
::: {opbdf0416_data_04}{1XHD-l0FTsiRR94FMiWalQ}{5DLi5iXRTFySy0ZnWcUu_w}{10.79.18.106}{10.79.18.106:9304}{rack_id=BB_Prod02}
Hot threads at 2020-10-14T16:09:35.356Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
::: {opbdf1620_master_90}{-Ye9TCK6QFm7PZz-NiVhEQ}{Bhfu3rLERmqdJjyrqlBUpw}{10.79.18.238}{10.79.18.238:9390}{rack_id=BB_Prod08}
Hot threads at 2020-10-14T16:09:35.355Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
::: {opbdf1316_coord_00}{RgLcdr5KSZymIrs0WM_VCQ}{dnXCgAzRTNuKSJqQltNVWQ}{10.79.18.234}{10.79.18.234:9300}{rack_id=BB_Prod07}
Hot threads at 2020-10-14T16:09:35.358Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
::: {opbdf1820_coord_00}{LWVbOumgT4quBdBbLmvaTA}{J03be4cKQXCZaS0pJNavQQ}{10.79.20.38}{10.79.20.38:9300}{rack_id=BB_Prod09}
Hot threads at 2020-10-14T16:09:35.356Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
::: {opbdf1716_data_02}{UT1F0DLvQw68M2gwVYsXmw}{4Srl_Q25QLqM16wGss58Sg}{10.79.20.16}{10.79.20.16:9302}{rack_id=BB_Prod09}
Hot threads at 2020-10-14T16:09:35.356Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
1.1% (5.7ms out of 500ms) cpu usage by thread 'elasticsearch[opbdf1716_data_02][[z_app_2ip_es_socle_cdr-20201014][35]: Lucene Merge Thread #640]'
10/10 snapshots sharing following 7 elements
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:200)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:89)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661)