Hi,
We are running a cluster of 8 master eligible data nodes on ES 6.7.1. For one of our indices we have 18 shards with 0 replica's spread across 3 nodes(EC2 m5.2x large servers with 8 cores and 32). Shard size is approx GB. This is the only index on these 3 nodes. The servers are serving 150 read requests per second at their peak.
The issue is that we see intermittent server spikes when the CPU usage reaches 100%. Following is the output of hot threads
96.2% (481.1ms out of 500ms) cpu usage by thread 'elasticsearch[secondary1][search][T#7]'
10/10 snapshots sharing following 38 elements
org.apache.lucene.search.MatchAllDocsQuery$1$1.score(MatchAllDocsQuery.java:62)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.LRUQueryCache.cacheIntoBitSet(LRUQueryCache.java:501)
org.apache.lucene.search.LRUQueryCache.cacheImpl(LRUQueryCache.java:492)
org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.cache(LRUQueryCache.java:701)
org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorerSupplier(LRUQueryCache.java:748)
org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.scorerSupplier(IndicesQueryCache.java:157)
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:364)
org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorerSupplier(LRUQueryCache.java:751)
org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.scorerSupplier(IndicesQueryCache.java:157)
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:364)
org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorerSupplier(LRUQueryCache.java:751)
org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.scorerSupplier(IndicesQueryCache.java:157)
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:364)
org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:330)
org.apache.lucene.search.Weight.bulkScorer(Weight.java:177)
org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:324)
org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:832)
org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.bulkScorer(IndicesQueryCache.java:163)
org.elasticsearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:180)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:667)
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:276)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:349)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:393)
org.elasticsearch.search.SearchService.access$100(SearchService.java:125)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:358)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:354)
org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1085)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
I want to know if it's possible to deduce what the threads are doing using this stacktrace,
Any info regarding this would be helpful.