fgourichon
(Francois G.)
September 12, 2020, 12:25pm
1
Hi there,
I'm trying to understand why we sometimes get high CPU utilisation on only one Data node of our cluster. We mainly use 2 indexes (80GB for one, 500MB for the second) with 5 shards and one replica each which seems to be evenly distributed across our 3 nodes.
We sometimes see one node CPU getting stuck at 100% while the other ones are at around 25%, and it seems to be always the same node...
Any idea?
I have a hot_thread snapshot that I can share if it helps.
Our ES version is 6.8.8
Yes, please share the hot threads output.
fgourichon
(Francois G.)
September 12, 2020, 11:08pm
3
(Splitting in several messages, the text is too long)
::: {instance-0000000002}{L6uqrHubT66TurycvCyEqg}{jZaNbG7yQlGwd5KrW2_UVw}{10.0.42.65}{10.0.42.65:19936}{logical_availability_zone=zone-2, server_name=instance-0000000002.7010a5b641be40388118884e5d60a284, availability_zone=ap-southeast-2c, xpack.installed=true, region=ap-southeast-2, instance_configuration=aws.data.highio.i3}
Hot threads at 2020-09-04T13:18:53.275Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
83.2% (416.2ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000002][search][T#3]'
7/10 snapshots sharing following 32 elements
java.nio.Bits.copyToArray(Bits.java:836)
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
org.apache.lucene.store.ByteBufferGuard.getBytes(ByteBufferGuard.java:93)
org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:89)
org.apache.lucene.codecs.blocktree.IntersectTermsEnumFrame.load(IntersectTermsEnumFrame.java:194)
org.apache.lucene.codecs.blocktree.IntersectTermsEnum.pushFrame(IntersectTermsEnum.java:208)
org.apache.lucene.codecs.blocktree.IntersectTermsEnum._next(IntersectTermsEnum.java:662)
org.apache.lucene.codecs.blocktree.IntersectTermsEnum.next(IntersectTermsEnum.java:497)
org.apache.lucene.search.FuzzyTermsEnum.next(FuzzyTermsEnum.java:211)
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:67)
org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
org.apache.lucene.search.DisjunctionMaxQuery.rewrite(DisjunctionMaxQuery.java:219)
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:246)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:685)
org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106)
org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:263)
org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:91)
org.elasticsearch.search.SearchService.createContext(SearchService.java:660)
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:599)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:387)
org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355)
org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
3/10 snapshots sharing following 25 elements
org.apache.lucene.codecs.blocktree.IntersectTermsEnum.next(IntersectTermsEnum.java:497)
org.apache.lucene.search.FuzzyTermsEnum.next(FuzzyTermsEnum.java:211)
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:67)
org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
org.apache.lucene.search.DisjunctionMaxQuery.rewrite(DisjunctionMaxQuery.java:219)
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:246)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:685)
org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106)
org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:263)
org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:91)
org.elasticsearch.search.SearchService.createContext(SearchService.java:660)
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:599)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:387)
org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355)
org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
64.5% (322.6ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000002][search][T#4]'
9/10 snapshots sharing following 25 elements
org.apache.lucene.codecs.blocktree.IntersectTermsEnum.next(IntersectTermsEnum.java:497)
org.apache.lucene.search.FuzzyTermsEnum.next(FuzzyTermsEnum.java:211)
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:67)
org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
org.apache.lucene.search.DisjunctionMaxQuery.rewrite(DisjunctionMaxQuery.java:219)
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:246)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:685)
org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106)
org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:263)
org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:91)
org.elasticsearch.search.SearchService.createContext(SearchService.java:660)
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:599)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:387)
org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355)
org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
unique snapshot
java.nio.Bits.copyToArray(Bits.java:836)
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
org.apache.lucene.store.ByteBufferGuard.getBytes(ByteBufferGuard.java:93)
org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:89)
org.apache.lucene.codecs.blocktree.IntersectTermsEnumFrame.load(IntersectTermsEnumFrame.java:194)
org.apache.lucene.codecs.blocktree.IntersectTermsEnum.<init>(IntersectTermsEnum.java:127)
org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:188)
org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:169)
org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:196)
org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:151)
org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
org.apache.lucene.search.DisjunctionMaxQuery.rewrite(DisjunctionMaxQuery.java:219)
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:246)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:685)
org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106)
org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:263)
org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:91)
org.elasticsearch.search.SearchService.createContext(SearchService.java:660)
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:599)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:387)
org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355)
org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
fgourichon
(Francois G.)
September 12, 2020, 11:10pm
4
instance-0000000002 continued:
57.0% (285.1ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000002][search][T#2]'
9/10 snapshots sharing following 25 elements
org.apache.lucene.codecs.blocktree.IntersectTermsEnum.next(IntersectTermsEnum.java:497)
org.apache.lucene.search.FuzzyTermsEnum.next(FuzzyTermsEnum.java:211)
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:67)
org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
org.apache.lucene.search.DisjunctionMaxQuery.rewrite(DisjunctionMaxQuery.java:219)
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:246)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:685)
org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106)
org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:263)
org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:91)
org.elasticsearch.search.SearchService.createContext(SearchService.java:660)
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:599)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:387)
org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355)
org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
unique snapshot
java.nio.Bits.copyToArray(Bits.java:836)
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
org.apache.lucene.store.ByteBufferGuard.getBytes(ByteBufferGuard.java:93)
org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:89)
org.apache.lucene.codecs.blocktree.IntersectTermsEnumFrame.load(IntersectTermsEnumFrame.java:194)
org.apache.lucene.codecs.blocktree.IntersectTermsEnum.<init>(IntersectTermsEnum.java:127)
org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:188)
org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:169)
org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:196)
org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:151)
org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
org.apache.lucene.search.DisjunctionMaxQuery.rewrite(DisjunctionMaxQuery.java:219)
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:246)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:685)
org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106)
org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:263)
org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:91)
org.elasticsearch.search.SearchService.createContext(SearchService.java:660)
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:599)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:387)
org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355)
org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
And the 2 other instances
::: {instance-0000000001}{VACHH-0hTn2DkDifEWdw2A}{tPKYBXW0QduX6FQ-jVO1fw}{10.0.30.247}{10.0.30.247:19594}{logical_availability_zone=zone-1, server_name=instance-0000000001.7010a5b641be40388118884e5d60a284, availability_zone=ap-southeast-2b, xpack.installed=true, instance_configuration=aws.data.highio.i3, region=ap-southeast-2}
Hot threads at 2020-09-04T13:18:53.274Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
::: {instance-0000000000}{28gdCCBDTV2HMa1RcbU2_Q}{7lBc1d7pQiqYAnxjRI1Mmg}{10.0.10.217}{10.0.10.217:19331}{logical_availability_zone=zone-0, server_name=instance-0000000000.7010a5b641be40388118884e5d60a284, availability_zone=ap-southeast-2a, xpack.installed=true, region=ap-southeast-2, instance_configuration=aws.data.highio.i3}
Hot threads at 2020-09-04T13:18:53.275Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
It looks like that node is busy serving an expensive fuzzy search request. Do you have any expensive searches running against indices with shards primarily on this node? Are all your shards and replicas assigned and evenly distributed across the cluster? Do you use preference when you query the cluster? Do all nodes have the same hardware specification?
fgourichon
(Francois G.)
September 13, 2020, 11:52am
6
Yes, all nodes have the same specifications, shards and replicas are evenly distributed, no preference when we query the cluster as far as I can tell.
During the test which resulted in this snapshot (this was on a cluster we created for the occasion, copy from our Prod ), we were only running 2 types of search and 1 type of indexing with different inputs so I can't think of any specific search which would be more complex than another.
fgourichon
(Francois G.)
September 18, 2020, 7:54am
7
Just a thought, what if I have a lot of document deletions going on in parallel, could that be a lead?
system
(system)
Closed
October 16, 2020, 7:54am
8
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.