We have 3 Nodes, we had these in hot threads:
Hot threads at 2017-03-23T12:04:29.200Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
97.4% (487.1ms out of 500ms) cpu usage by thread 'elasticsearch[pro-analytics3][management][T#2]'
2/10 snapshots sharing following 24 elements
sun.nio.fs.UnixPath.(UnixPath.java:71)
sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
sun.nio.fs.AbstractPath.resolve(AbstractPath.java:53)
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:209)
org.apache.lucene.store.FileSwitchDirectory.fileLength(FileSwitchDirectory.java:150)
org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67)
org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67)
org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1543)
org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1532)
org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1519)
org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
org.elasticsearch.index.store.Store.stats(Store.java:293)
org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:665)
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:134)
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:165)
in logs we see:
[2017-03-23 12:58:29,756][WARN ][transport ] [pro-analytics3] Received response for a request that has timed out, sent [99355ms] ago, timed out [58057ms] ago, action [cluster:monitor/nodes/stats[n]], node [{pro-analytics3}{fkxzHRaSTUiHR99KbZKT8Q}
[2017-03-23 12:58:29,768][WARN ][monitor.jvm ] [pro-analytics3] [gc][old][386283][774] duration [57.2s], collections [1]/[58.2s], total [57.2s]/[2h], memory [14.1gb]->[14.1gb]/[14.3gb], all_pools {[young] [409.2mb]->[418.5mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [13.7gb]->[13.7gb]/[13.7gb]}
[2017-03-23 12:59:15,208][WARN ][monitor.jvm ] [pro-analytics3] [gc][old][386284][775] duration [44.6s], collections [1]/[45s], total [44.6s]/[2h], memory [14.1gb]->[14.1gb]/[14.3gb], all_pools {[young] [418.5mb]->[422.9mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [13.7gb]->[13.7gb]/[13.7gb]}
[2017-03-23 13:00:12,361][WARN ][discovery.zen.publish ] [pro-analytics3] timed out waiting for all nodes to process published state [18657] (timeout [30s], pending nodes: [{pro-analytics2}{pMCPbvEzSEqp7lwUD_rvKg}{box_type=hot}, {pro-analytics1}{jO9IIlQFT_qJI5gI7JTHHg}{box_type=hot}])