I've had an issue with CPU usage for some time now on a 2.1.0 then 2.1.1 cluster with one node. The service seems to take up almost 100% cpu time despite there being no activity on the cluster (i.e., kibana and logstash services stopped):
top - 10:45:56 up 3 days, 15:13, 2 users, load average: 2.18, 1.73, 1.52
Tasks: 138 total, 2 running, 136 sleeping, 0 stopped, 0 zombie
%Cpu(s): 56.4 us, 1.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.5 si, 41.1 st
KiB Mem: 3854780 total, 3459524 used, 395256 free, 342280 buffers
KiB Swap: 0 total, 0 used, 0 free. 1041724 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1309 elastic+ 20 0 5514232 1.474g 63304 S 96.8 40.1 2868:11 java
1211 dd-agent 20 0 210528 23032 4272 S 3.0 0.6 62:58.72 python
If I look in the hot threads output, only a small percentage is accounted for:
::: [asura-analytics00-test][fRwWxjs1To2qzIttd7EoLA][asura-analytics00-test][inet[/172.31.12.104:9300]]{master=true}
Hot threads at 2016-04-11T10:48:43.151Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
19.6% (97.9ms out of 500ms) cpu usage by thread 'elasticsearch[asura-analytics00-test][listener][T#1]'
7/10 snapshots sharing following 16 elements
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:308)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:275)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:308)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:275)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:254)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.map(AbstractXContentParser.java:208)
org.elasticsearch.cluster.ClusterState.toXContent(ClusterState.java:431)
org.elasticsearch.rest.action.admin.cluster.state.RestClusterStateAction$1.buildResponse(RestClusterStateAction.java:85)
org.elasticsearch.rest.action.admin.cluster.state.RestClusterStateAction$1.buildResponse(RestClusterStateAction.java:80)
org.elasticsearch.rest.action.support.RestBuilderListener.buildResponse(RestBuilderListener.java:37)
org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:113)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
2/10 snapshots sharing following 13 elements
org.elasticsearch.common.xcontent.XContentBuilder.writeValue(XContentBuilder.java:1197)
org.elasticsearch.common.xcontent.XContentBuilder.writeMap(XContentBuilder.java:1163)
org.elasticsearch.common.xcontent.XContentBuilder.map(XContentBuilder.java:1072)
org.elasticsearch.cluster.ClusterState.toXContent(ClusterState.java:437)
org.elasticsearch.rest.action.admin.cluster.state.RestClusterStateAction$1.buildResponse(RestClusterStateAction.java:85)
org.elasticsearch.rest.action.admin.cluster.state.RestClusterStateAction$1.buildResponse(RestClusterStateAction.java:80)
org.elasticsearch.rest.action.support.RestBuilderListener.buildResponse(RestBuilderListener.java:37)
org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:113)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
unique snapshot (snipped)
If I check the log currently it's full of this, although I have solved a few prior issues producing the same ~100% cpu usage with different log output.
[2016-04-11 10:52:49,058][INFO ][monitor.jvm ] [-test] [gc][old][300596][29556] duration [6.1s], collections [1]/[6.8s], total [6.1s]/[4.5h], memory [1.1gb]->[1.1gb]/[1.2gb], all_pools {[young] [24.4mb]->[22.8mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.1gb]->[1.1gb]/[1.1gb]}