Elasticsearch process using ~100% of CPU

Paul_Cardy · April 11, 2016, 10:59am

I've had an issue with CPU usage for some time now on a 2.1.0 then 2.1.1 cluster with one node. The service seems to take up almost 100% cpu time despite there being no activity on the cluster (i.e., kibana and logstash services stopped):

top - 10:45:56 up 3 days, 15:13,  2 users,  load average: 2.18, 1.73, 1.52
Tasks: 138 total,   2 running, 136 sleeping,   0 stopped,   0 zombie
%Cpu(s): 56.4 us,  1.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.5 si, 41.1 st
KiB Mem:   3854780 total,  3459524 used,   395256 free,   342280 buffers
KiB Swap:        0 total,        0 used,        0 free.  1041724 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1309 elastic+  20   0 5514232 1.474g  63304 S 96.8 40.1   2868:11 java
 1211 dd-agent  20   0  210528  23032   4272 S  3.0  0.6  62:58.72 python

If I look in the hot threads output, only a small percentage is accounted for:

::: [asura-analytics00-test][fRwWxjs1To2qzIttd7EoLA][asura-analytics00-test][inet[/172.31.12.104:9300]]{master=true}
   Hot threads at 2016-04-11T10:48:43.151Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   19.6% (97.9ms out of 500ms) cpu usage by thread 'elasticsearch[asura-analytics00-test][listener][T#1]'
     7/10 snapshots sharing following 16 elements
       org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:308)
       org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:275)
       org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:308)
       org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:275)
       org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:254)
       org.elasticsearch.common.xcontent.support.AbstractXContentParser.map(AbstractXContentParser.java:208)
       org.elasticsearch.cluster.ClusterState.toXContent(ClusterState.java:431)
       org.elasticsearch.rest.action.admin.cluster.state.RestClusterStateAction$1.buildResponse(RestClusterStateAction.java:85)
       org.elasticsearch.rest.action.admin.cluster.state.RestClusterStateAction$1.buildResponse(RestClusterStateAction.java:80)
       org.elasticsearch.rest.action.support.RestBuilderListener.buildResponse(RestBuilderListener.java:37)
       org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
       org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)
       org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:113)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     2/10 snapshots sharing following 13 elements
       org.elasticsearch.common.xcontent.XContentBuilder.writeValue(XContentBuilder.java:1197)
       org.elasticsearch.common.xcontent.XContentBuilder.writeMap(XContentBuilder.java:1163)
       org.elasticsearch.common.xcontent.XContentBuilder.map(XContentBuilder.java:1072)
       org.elasticsearch.cluster.ClusterState.toXContent(ClusterState.java:437)
       org.elasticsearch.rest.action.admin.cluster.state.RestClusterStateAction$1.buildResponse(RestClusterStateAction.java:85)
       org.elasticsearch.rest.action.admin.cluster.state.RestClusterStateAction$1.buildResponse(RestClusterStateAction.java:80)
       org.elasticsearch.rest.action.support.RestBuilderListener.buildResponse(RestBuilderListener.java:37)
       org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
       org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)
       org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:113)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     unique snapshot (snipped)

If I check the log currently it's full of this, although I have solved a few prior issues producing the same ~100% cpu usage with different log output.

[2016-04-11 10:52:49,058][INFO ][monitor.jvm              ] [-test] [gc][old][300596][29556] duration [6.1s], collections [1]/[6.8s], total [6.1s]/[4.5h], memory [1.1gb]->[1.1gb]/[1.2gb], all_pools {[young] [24.4mb]->[22.8mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.1gb]->[1.1gb]/[1.1gb]}

warkolm · April 12, 2016, 7:03am

What java version are you on?

Paul_Cardy · April 12, 2016, 10:44am

Hi Mark,

Our server is reporting the following:

java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode) s

dickepa · April 28, 2016, 10:01am

Recently I have also seen very high CPU from Elasticsearch to the point of affecting the performance of Kibana making it unusable. It is constantly above 100%. The system isn't live yet and has a small number of nodes reporting in. I'm finalising the project and will demo to the team very soon and hope I can fix it in time, so all help appreciated.

top - 10:29:59 up 16:26, 2 users, load average: 1.62, 2.32, 3.00
Tasks: 148 total, 1 running, 147 sleeping, 0 stopped, 0 zombie
Cpu(s): 26.6%us, 0.5%sy, 2.0%ni, 70.9%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 7727384k total, 6207292k used, 1520092k free, 128664k buffers
Swap: 1048572k total, 0k used, 1048572k free, 3204784k cached

The whole stack is on the same server, a VM with only 5 machines sending their logs using Topbeat, Filebeat, Packetbeat and Winlogbeat. I have met the server requirements for memory and CPU but also read on an issue thread that running a high number of network switch MTUs was required than the usual setting of 1500MTU. 9000 better. However such a change will affect a high number of machines so subject to change board, etc.
Whilst I'm considering that change as a requirement for a production system receiving numerous logs from many nodes, I currently have a small number and surprised to see the "threshold" reached so soon! Hopefully this is not the cause of Elasticsearch's high demand for CPU.

I have tried the latest 2.3.2 build that installed well my Centos server and CPU still high!
I ran curl -XGET 'localhost:9200' to verify version and it reported 2.2.1!

"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.2.1",
"build_hash" : "d045fc29d1932bce18b2e65ab8b297fbf6cd41a1",
"build_timestamp" : "2016-03-09T09:38:54Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"

Can anyone advise anything for me to try? Many thanks.

rusty · April 28, 2016, 1:35pm

Hi! You should install Marvel or any other monitoring solution to get a realistic view what's happening with cluster just before and at CPU spike. Another point is to monitor hot threads at high CPU (maybe you'll get any glue from here).

dickepa · April 28, 2016, 4:01pm

Thanks Rusty will have a look

warkolm · April 30, 2016, 1:28am

These issues may sound similar but it's better if your create own thread for this.

Topic		Replies	Views
Almost 90-100% CPU usage while doing nothing Elasticsearch	3	1961	July 6, 2017
Single thread with high CPU usage Elasticsearch	3	2631	July 6, 2017
Elasticsearch eat 100% of cpu Elasticsearch	6	1151	October 20, 2020
ElasticSeach full CPU usage Elasticsearch	7	460	July 6, 2017
Elasticsearch high cpu usage Elasticsearch	1	706	July 6, 2017

Elasticsearch process using ~100% of CPU

Related topics