Hello All
I have just restored from a 2.6TB backup, and achieved green state. Shard balancing has finished, but before I even put the server back under load, there is significant CPU and Load values.
Cluster background:
!Kubernetes! - ES 5.5.2, 15 nodes, 251 indices, 1,506 shards, 979,652,048 docs Totalling 2.61TB.
Each data node has 28GB, with 15GB heap available, and I have 8 data nodes, 3 masters and 4 clients.
Data nodes have changed to use SSDs with (allegedly) 2500 IOPS.
The first thing I did to find out what was going all is call the /_nodes/hot_threads API, and it looks like its responding to management requests; specifically completion stats - here is a partial dump of that:
100.3% (501.3ms out of 500ms) cpu usage by thread 'elasticsearch[es-live-elasticsearch-data-4][management][T#1]'
3/10 snapshots sharing following 23 elements
java.util.TreeMap.getEntry(TreeMap.java:359)
java.util.TreeMap.get(TreeMap.java:278)
org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.terms(BlockTreeTermsReader.java:292)
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms(PerFieldPostingsFormat.java:316)
org.elasticsearch.search.suggest.completion.CompletionFieldStats.completionStats(CompletionFieldStats.java:54)
org.elasticsearch.index.shard.IndexShard.completionStats(IndexShard.java:743)
org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:207)
org.elasticsearch.indices.IndicesService.indexShardStats(IndicesService.java:343)
org.elasticsearch.indices.IndicesService.statsByShard(IndicesService.java:313)
org.elasticsearch.indices.IndicesService.stats(IndicesService.java:304)
org.elasticsearch.node.NodeService.stats(NodeService.java:105)
org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:77)
org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:42)
org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140)
org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:262)
org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:258)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
7/10 snapshots sharing following 16 elements
org.elasticsearch.indices.IndicesService.indexShardStats(IndicesService.java:343)
org.elasticsearch.indices.IndicesService.statsByShard(IndicesService.java:313)
org.elasticsearch.indices.IndicesService.stats(IndicesService.java:304)
org.elasticsearch.node.NodeService.stats(NodeService.java:105)
org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:77)
org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:42)
org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140)
org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:262)
org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:258)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
::: {es-live-elasticsearch-data-1}{bvST72s_RQqa1okxjnUJlQ}{gcEuMgzoQkGeus5PS16Ipg}{10.244.5.6}{10.244.5.6:9300}
Hot threads at 2018-04-11T06:31:20.276Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
100.5% (502.3ms out of 500ms) cpu usage by thread 'elasticsearch[es-live-elasticsearch-data-1][management][T#3]'
2/10 snapshots sharing following 23 elements
java.util.TreeMap.getEntry(TreeMap.java:359)
java.util.TreeMap.get(TreeMap.java:278)
org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.terms(BlockTreeTermsReader.java:292)
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms(PerFieldPostingsFormat.java:316)
org.elasticsearch.search.suggest.completion.CompletionFieldStats.completionStats(CompletionFieldStats.java:54)
org.elasticsearch.index.shard.IndexShard.completionStats(IndexShard.java:743)
org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:207)
org.elasticsearch.indices.IndicesService.indexShardStats(IndicesService.java:343)
org.elasticsearch.indices.IndicesService.statsByShard(IndicesService.java:313)
org.elasticsearch.indices.IndicesService.stats(IndicesService.java:304)
org.elasticsearch.node.NodeService.stats(NodeService.java:105)
I am not running marvel, but I read somewhere that xpack might be monitoring health and calling some management APIs - is there a way I can verify this?
I was using cerebro, so I turned that off for a while, but the problem persisted.
Can anyone point me in the right direction for finding out what this is doing?
Thanks all,
Matt