Unusual High CPU usage of cluster nodes


#1

Describe the feature : High CPU usage of nodes.

Elasticsearch version ( bin/elasticsearch --version ): 1.4.5

Plugins installed :

JVM version ( java -version ): openjdk version "1.8.0_191"

OS version ( uname -a if on a Unix-like system): Linux es06 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior :

High CPU usage of threads [transport_client_worker].

Steps to reproduce :

  1. No pending tasks. The problem happens all the time.
  2. We have several clusters, only one cluster has this problem which causes system reach high load (48 core cpu, 512 memory, with 16 elasticsearch nodes per ECS, system load peak 80%).
  3. Cluster have 124 nodes, include 120 data nodes, 2 calculator nodes, 2 master nodes. 198 indexes, 128 shards per index.

Provide logs (if relevant) :

::: [es06_01-31][aHA8XzzEQWqga9kWup39XQ][es06_01.xxx.com][inet[/192.168.136.26:9330]]{max_local_storage_nodes=1, data=false, action.disable_shutdown=true, master=false}

394.6% (1.9s out of 500ms) cpu usage by thread 'elasticsearch[es06_01-31][transport_client_worker][T#62]{New I/O worker [#62](https://github.com/elastic/elasticsearch/issues/62)}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

357.4% (1.7s out of 500ms) cpu usage by thread 'elasticsearch[es06_01-31][transport_client_worker][T#24]{New I/O worker [#24](https://github.com/elastic/elasticsearch/issues/24)}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

347.3% (1.7s out of 500ms) cpu usage by thread 'elasticsearch[es06_01-31][transport_client_worker][T#6]{New I/O worker [#6](https://github.com/elastic/elasticsearch/issues/6)}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)