ES 5.5.2 - No queries but CPU and Load significantly high after restore

Hello All

I have just restored from a 2.6TB backup, and achieved green state. Shard balancing has finished, but before I even put the server back under load, there is significant CPU and Load values.

Cluster background:
!Kubernetes! - ES 5.5.2, 15 nodes, 251 indices, 1,506 shards, 979,652,048 docs Totalling 2.61TB.
Each data node has 28GB, with 15GB heap available, and I have 8 data nodes, 3 masters and 4 clients.

Data nodes have changed to use SSDs with (allegedly) 2500 IOPS.

The first thing I did to find out what was going all is call the /_nodes/hot_threads API, and it looks like its responding to management requests; specifically completion stats - here is a partial dump of that:

   100.3% (501.3ms out of 500ms) cpu usage by thread 'elasticsearch[es-live-elasticsearch-data-4][management][T#1]'
     3/10 snapshots sharing following 23 elements
       java.util.TreeMap.getEntry(TreeMap.java:359)
       java.util.TreeMap.get(TreeMap.java:278)
       org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.terms(BlockTreeTermsReader.java:292)
       org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms(PerFieldPostingsFormat.java:316)
       org.elasticsearch.search.suggest.completion.CompletionFieldStats.completionStats(CompletionFieldStats.java:54)
       org.elasticsearch.index.shard.IndexShard.completionStats(IndexShard.java:743)
       org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:207)
       org.elasticsearch.indices.IndicesService.indexShardStats(IndicesService.java:343)
       org.elasticsearch.indices.IndicesService.statsByShard(IndicesService.java:313)
       org.elasticsearch.indices.IndicesService.stats(IndicesService.java:304)
       org.elasticsearch.node.NodeService.stats(NodeService.java:105)
       org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:77)
       org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:42)
       org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140)
       org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:262)
       org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:258)
       org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
       org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     7/10 snapshots sharing following 16 elements
       org.elasticsearch.indices.IndicesService.indexShardStats(IndicesService.java:343)
       org.elasticsearch.indices.IndicesService.statsByShard(IndicesService.java:313)
       org.elasticsearch.indices.IndicesService.stats(IndicesService.java:304)
       org.elasticsearch.node.NodeService.stats(NodeService.java:105)
       org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:77)
       org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:42)
       org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140)
       org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:262)
       org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:258)
       org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
       org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
   
  
::: {es-live-elasticsearch-data-1}{bvST72s_RQqa1okxjnUJlQ}{gcEuMgzoQkGeus5PS16Ipg}{10.244.5.6}{10.244.5.6:9300}
   Hot threads at 2018-04-11T06:31:20.276Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   100.5% (502.3ms out of 500ms) cpu usage by thread 'elasticsearch[es-live-elasticsearch-data-1][management][T#3]'
     2/10 snapshots sharing following 23 elements
       java.util.TreeMap.getEntry(TreeMap.java:359)
       java.util.TreeMap.get(TreeMap.java:278)
       org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.terms(BlockTreeTermsReader.java:292)
       org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms(PerFieldPostingsFormat.java:316)
       org.elasticsearch.search.suggest.completion.CompletionFieldStats.completionStats(CompletionFieldStats.java:54)
       org.elasticsearch.index.shard.IndexShard.completionStats(IndexShard.java:743)
       org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:207)
       org.elasticsearch.indices.IndicesService.indexShardStats(IndicesService.java:343)
       org.elasticsearch.indices.IndicesService.statsByShard(IndicesService.java:313)
       org.elasticsearch.indices.IndicesService.stats(IndicesService.java:304)
       org.elasticsearch.node.NodeService.stats(NodeService.java:105)

I am not running marvel, but I read somewhere that xpack might be monitoring health and calling some management APIs - is there a way I can verify this?

I was using cerebro, so I turned that off for a while, but the problem persisted.

Can anyone point me in the right direction for finding out what this is doing?

Thanks all,
Matt

Hey,

are you using the completion suggester? Do you have a lot of fields?

If you have xpack installed you could try to disable monitoring by either setting xpack.monitoring.enabled: false.

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.