One thread 100% CPU

xavier.tiendeo · March 23, 2017, 12:19pm

We have 3 Nodes, we had these in hot threads:

Hot threads at 2017-03-23T12:04:29.200Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

97.4% (487.1ms out of 500ms) cpu usage by thread 'elasticsearch[pro-analytics3][management][T#2]'
2/10 snapshots sharing following 24 elements
sun.nio.fs.UnixPath.(UnixPath.java:71)
sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
sun.nio.fs.AbstractPath.resolve(AbstractPath.java:53)
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:209)
org.apache.lucene.store.FileSwitchDirectory.fileLength(FileSwitchDirectory.java:150)
org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67)
org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67)
org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1543)
org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1532)
org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1519)
org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
org.elasticsearch.index.store.Store.stats(Store.java:293)
org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:665)
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:134)
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:165)

in logs we see:

[2017-03-23 12:58:29,756][WARN ][transport ] [pro-analytics3] Received response for a request that has timed out, sent [99355ms] ago, timed out [58057ms] ago, action [cluster:monitor/nodes/stats[n]], node [{pro-analytics3}{fkxzHRaSTUiHR99KbZKT8Q}

[2017-03-23 12:58:29,768][WARN ][monitor.jvm ] [pro-analytics3] [gc][old][386283][774] duration [57.2s], collections [1]/[58.2s], total [57.2s]/[2h], memory [14.1gb]->[14.1gb]/[14.3gb], all_pools {[young] [409.2mb]->[418.5mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [13.7gb]->[13.7gb]/[13.7gb]}

[2017-03-23 12:59:15,208][WARN ][monitor.jvm ] [pro-analytics3] [gc][old][386284][775] duration [44.6s], collections [1]/[45s], total [44.6s]/[2h], memory [14.1gb]->[14.1gb]/[14.3gb], all_pools {[young] [418.5mb]->[422.9mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [13.7gb]->[13.7gb]/[13.7gb]}

[2017-03-23 13:00:12,361][WARN ][discovery.zen.publish ] [pro-analytics3] timed out waiting for all nodes to process published state [18657] (timeout [30s], pending nodes: [{pro-analytics2}{pMCPbvEzSEqp7lwUD_rvKg}{box_type=hot}, {pro-analytics1}{jO9IIlQFT_qJI5gI7JTHHg}{box_type=hot}])

Christian_Dahlqvist · March 23, 2017, 12:31pm

That is some very, very long GC you have there. Which version of Elasticsearch are you using? If you have monitoring installed, what does your heap usage look like?

xavier.tiendeo · March 23, 2017, 1:14pm

"version": {
"number": "2.2.0",

we just have kopf and yes it was like 98% now... but normaly it goes from 50% to 70%

Christian_Dahlqvist · March 23, 2017, 1:17pm

What you see in the logs is very long GC causing problems. If this is a recurring problem I would recommend looking at what takes up your heap and try to address that, or maybe even scale out.

xavier.tiendeo · March 23, 2017, 1:18pm

ok thanks, any recomentdation in how to do the investigation: what takes up your heap

Christian_Dahlqvist · March 23, 2017, 1:58pm

Look at the node stats API. Having a very large number of shards can also tie up resources. Also check that you have swap disabled and that memory is not over-committed if you are using VMs, as this can slow down GC.

xavier.tiendeo · March 23, 2017, 2:07pm

thanks

xavier.tiendeo · March 24, 2017, 10:21am

do you nkow if ther is a way to change master node to other node manually?

Christian_Dahlqvist · March 24, 2017, 10:37am

No, there is no API for that. Am not sure how that would help either.

xavier.tiendeo · March 24, 2017, 10:40am

Not for solving the issue but for pass the resposability/work to other node so old master node can solve have more resources

system · April 21, 2017, 10:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Single thread with high CPU usage Elasticsearch	3	2631	July 6, 2017
High CPU usage in Monitoring Server due to ES Elasticsearch	13	4238	July 5, 2017
Master High CPU Elasticsearch	3	1775	July 6, 2017
Elasticsearch high load/CPU usage Elasticsearch	10	9582	July 6, 2017
One node frequently goes into 100% CPU and GC loop Elasticsearch	3	1049	July 5, 2017

One thread 100% CPU

Related topics