Occasional maxed out CPU

seagullmouse · May 24, 2016, 9:32am

[We are on Elasticsearch 1.7.2, 30 node cluster]

Occasionally we see maxed out CPU usage across the cluster. Each time the cluster resolves the situation after about 20 minutes but in that period the cluster is unresponsive.

Output from hot threads indicates that most nodes are running/stuck in the same place of the Lucene code, here is a snippet.

100.6% (503.1ms out of 500ms) cpu usage by thread 'elasticsearch[node1][search][T#3]' 10/10 snapshots sharing following 20 elements org.elasticsearch.common.lucene.docset.AndDocIdSet$AndBits.get(AndDocIdSet.java:116) org.elasticsearch.common.lucene.docset.BitsDocIdSetIterator.matchDoc(BitsDocIdSetIterator.java:45) org.elasticsearch.common.lucene.docset.MatchDocIdSetIterator.nextDoc(MatchDocIdSetIterator.java:50) org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:59) org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:257) org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192) org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163) org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35) org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191) org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309) org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:117) org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:370) org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:795) org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:786) org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279) org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)

I've had a quick look at the Lucene code and it looks like a pretty inconspicuous line of code. My thoughts are that this indicates that Elasticsearch isn't stuck at that line in question, more that it is running that line over and over.

Anyone seen anything similar and/or thoughts as to what might be going on?

jprante · May 25, 2016, 9:29am

That's simple. Your application allows to retrieve large result set concurrently and tries to score each and every document. Your filter has to iterate over the complete result set, which kills your CPU.

Improve your queries!

seagullmouse · May 25, 2016, 12:55pm

Thanks Jorg you are right.

Topic		Replies	Views
CPU utilization crossing 98% always Elasticsearch	2	1996	March 18, 2019
High CPU usage due to certain stack trace Elasticsearch	1	627	June 25, 2019
High CPU usage, help Elasticsearch	1	401	July 5, 2017
CPU utilization crossing 98% Elasticsearch	2	770	April 24, 2019
Elasticsearch full CPU utillization Elasticsearch	2	824	July 6, 2017

Occasional maxed out CPU

Related topics