ES suddenly begin to consume CPU


(Waclaw Shakura) #1

Hi!
I have 6-nodes cluster. Sometimes cpu usage become very high and lows only after restarting elastic.
Any suggestions?

Here is my munin output:


And hot threads from one of the nodes:


Hot threads at 2015-09-16T14:30:20.501Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

18.7% (93.3ms out of 500ms) cpu usage by thread 'elasticsearch[br-es-03][search][T#3]'
4/10 snapshots sharing following 20 elements
org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:257)
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:157)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:286)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:297)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:776)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:767)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:277)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
6/10 snapshots sharing following 10 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1137)
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)


(Otis Gospodnetić) #2

Hi,

Can you share more than just your CPU metrics?
Is GC high?
How about query rates?
What about merges and disk IO?
Jump in evictions or size of your Threadpool queues?
...

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Elasticsearch Consulting & Support * http://sematext.com/


(Waclaw Shakura) #3

Hi, the problem was the error in client application. It tried to update large bulks of non-existent documents.


(system) #4