We have a 8 node ELK cluster in which 5 are data nodes and 3 are logstash nodes.
We have been experiencing extremely high CPU usage in which the Java process is taking 100 - 350% of CPU usage.
Data nodes are configured as followed
4vCPU
32GB RAM (16 allocated to ES_HEAP_SIZE)
1.4TB iSCSI Volume with 700GBs used
Swap is disabled.
CPU Load Avg from one of the nodes
load average: 12.13, 12.21, 10.83
Results from hot threads
curl http://localhost:9200/_nodes/hot_threads
31.9% (159.3ms out of 500ms) cpu usage by thread 'elasticsearch[apples02.atl.ucloud.int][search][T#7]'
4/10 snapshots sharing following 33 elements
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
org.apache.lucene.index.FilteredTermsEnum.docs(FilteredTermsEnum.java:189)
org.apache.lucene.index.FilteredTermsEnum.docs(FilteredTermsEnum.java:189)
org.elasticsearch.index.fielddata.ordinals.OrdinalsBuilder$3.next(OrdinalsBuilder.java:473)
org.elasticsearch.index.fielddata.plain.PackedArrayIndexFieldData.loadDirect(PackedArrayIndexFieldData.java:109)
org.elasticsearch.index.fielddata.plain.PackedArrayIndexFieldData.loadDirect(PackedArrayIndexFieldData.java:49)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:180)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:167)
org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937)
org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
28.9% (144.7ms out of 500ms) cpu usage by thread 'elasticsearch[apples02.atl.ucloud.int][search][T#6]'
10/10 snapshots sharing following 29 elements
org.elasticsearch.index.fielddata.plain.PackedArrayIndexFieldData.loadDirect(PackedArrayIndexFieldData.java:109)
org.elasticsearch.index.fielddata.plain.PackedArrayIndexFieldData.loadDirect(PackedArrayIndexFieldData.java:49)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:180)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:167)
org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937)
org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:167)
org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:74)
org.elasticsearch.search.facet.datehistogram.CountDateHistogramFacetExecutor