Elasticsearch CPU 100% GC

Hey guys, On the last day I decided to perform some operations in Elasticsearch. I had some indexes that were generated daily by Logstash, until I decided to use ILM to automatically manage and generate rollbacks.

After implementing the ILM, I reindexed the daily indexes via the API to just one monthly index (around 7 million documents in total) and later deleted the daily indexes as was done previously, without presenting any problems. However, after the delete operation, Elasticsearch began to show CPU consumption spikes of 100%, but it did not prevent the indexing and searching of data, there was just a higher latency.

These CPU consumption spikes appear every two minutes. The JVM Heap maintained the usual average consumption, as well as RAM consumption, but the Heap stops after the CPU spike. GC Young also shows spikes parallel to CPU consumption. I noticed that the number of index/shards are different depending on where you search (by X-Pack monitoring, one value is presented, by the API it is another value).

The cluster remained green and, in the Elasticsearch logs, it did not show any errors. In the GC logs I did not see any overhead logs, but I confess that I am not yet proficient in reading these GC logs. I restarted the cluster and disabled logstash hoping there would be a remediation, but still no success. What could be happening? Have you ever witnessed anything like this? Are there any specific ways that can help with troubleshooting?

It seems like your Elasticsearch cluster is acting up after you made some changes. If I were in your shoes, here's what I would do

  1. Check Shard Count: Make sure you don't have too many pieces (shards) in your monthly data bucket. Having too many can stress out your system.
  2. Monitoring Mismatch: Double-check the numbers your monitoring tools show versus what the system itself says. Sometimes, they can be a bit out of sync.
  3. Search Slowlogs: See if any searches are causing the trouble. Maybe some searches are like a traffic jam for your system. Fixing these can help.

You can run

GET /_nodes/hot_threads

That should show you what is using the CPU.

Did you mean rollover?

At first I was having good performance with 79 indexes and 94 shards, but with the integration of ILM and the addition of indexes with a high number of documents, I started to experience problems. Unfortunately I didn't come across slowlogs.

I noticed that this problem gets worse when a search is carried out in certain indexes that already have ILM, in addition to significantly increasing the search time in Kibana depending on the index.

These indexes that are "slow" in search contain two shards and the entire cluster is based on just one Elasticsearch node (Dedicated, 8vCPU, 32GB RAM, 600GB Storage, 16GB JVM Heap). Would it be time to increase the shards or add a second node?

Hello Stephen,

Result of Hot Threads:

Hot threads at 2023-12-03T14:23:33.598Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   100.0% [cpu=96.7%, other=3.3%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[Partenos][search_worker][T#10]'
     2/10 snapshots sharing following 37 elements
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.seekExact(Lucene90DocValuesProducer.java:1127)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$BaseSortedDocValues.lookupOrd(Lucene90DocValuesProducer.java:982)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.index.SingletonSortedSetDocValues.lookupOrd(SingletonSortedSetDocValues.java:95)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.field.IpDocValuesField$SortedSetIpSupplier.getInternal(IpDocValuesField.java:169)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.field.IpDocValuesField$SortedSetIpSupplier.getInternal(IpDocValuesField.java:146)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.field.IpDocValuesField.getInternal(IpDocValuesField.java:61)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.field.IpDocValuesField.getInternal(IpDocValuesField.java:32)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)
       java.base@20.0.2/java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(DirectMethodHandle$Holder)
       java.base@20.0.2/java.lang.invoke.LambdaForm$MH/0x00007fde68012000.invoke(LambdaForm$MH)
       java.base@20.0.2/java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder)
       java.base@20.0.2/java.lang.invoke.LambdaForm$MH/0x00007fde6978b800.guard(LambdaForm$MH)
       java.base@20.0.2/java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder)
       org.elasticsearch.painless.PainlessScript$Script.execute(if (doc['dst'].size() == 0) { ...:109)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:158)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractStringScriptFieldQuery.matches(AbstractStringScriptFieldQuery.java:27)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractStringScriptFieldQuery.matches(AbstractStringScriptFieldQuery.java:19)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractScriptFieldQuery$1$1.matches(AbstractScriptFieldQuery.java:77)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.DisjunctionScorer$TwoPhase.matches(DisjunctionScorer.java:158)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.ConjunctionDISI$ConjunctionTwoPhaseIterator.matches(ConjunctionDISI.java:346)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:281)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:45)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:542)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:482)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:398)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda$8120/0x00007fde69629c58.call(Unknown Source)
       java.base@20.0.2/java.util.concurrent.FutureTask.run(FutureTask.java:317)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@20.0.2/java.lang.Thread.runWith(Thread.java:1636)
       java.base@20.0.2/java.lang.Thread.run(Thread.java:1623)
     2/10 snapshots sharing following 32 elements
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.field.IpDocValuesField.getInternal(IpDocValuesField.java:61)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.field.IpDocValuesField.getInternal(IpDocValuesField.java:32)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)
       java.base@20.0.2/java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(DirectMethodHandle$Holder)
       java.base@20.0.2/java.lang.invoke.LambdaForm$MH/0x00007fde68012000.invoke(LambdaForm$MH)
       java.base@20.0.2/java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder)
       java.base@20.0.2/java.lang.invoke.LambdaForm$MH/0x00007fde6978b800.guard(LambdaForm$MH)
       java.base@20.0.2/java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder)
       org.elasticsearch.painless.PainlessScript$Script.execute(if (doc['dst'].size() == 0) { ...:119)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:158)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractStringScriptFieldQuery.matches(AbstractStringScriptFieldQuery.java:27)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractStringScriptFieldQuery.matches(AbstractStringScriptFieldQuery.java:19)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractScriptFieldQuery$1$1.matches(AbstractScriptFieldQuery.java:77)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.DisjunctionScorer$TwoPhase.matches(DisjunctionScorer.java:158)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.ConjunctionDISI$ConjunctionTwoPhaseIterator.matches(ConjunctionDISI.java:346)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:281)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:45)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:542)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:482)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:398)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda$8120/0x00007fde69629c58.call(Unknown Source)
       java.base@20.0.2/java.util.concurrent.FutureTask.run(FutureTask.java:317)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@20.0.2/java.lang.Thread.runWith(Thread.java:1636)
       java.base@20.0.2/java.lang.Thread.run(Thread.java:1623)
     2/10 snapshots sharing following 32 elements
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.field.IpDocValuesField.getInternal(IpDocValuesField.java:61)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.field.IpDocValuesField.getInternal(IpDocValuesField.java:32)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)
       java.base@20.0.2/java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(DirectMethodHandle$Holder)
       java.base@20.0.2/java.lang.invoke.LambdaForm$MH/0x00007fde68012000.invoke(LambdaForm$MH)
       java.base@20.0.2/java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder)
       java.base@20.0.2/java.lang.invoke.LambdaForm$MH/0x00007fde6978b800.guard(LambdaForm$MH)
       java.base@20.0.2/java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder)
       org.elasticsearch.painless.PainlessScript$Script.execute(if (doc['src'].size() == 0) { ...:109)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:158)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractStringScriptFieldQuery.matches(AbstractStringScriptFieldQuery.java:27)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractStringScriptFieldQuery.matches(AbstractStringScriptFieldQuery.java:19)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractScriptFieldQuery$1$1.matches(AbstractScriptFieldQuery.java:77)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.DisjunctionScorer$TwoPhase.matches(DisjunctionScorer.java:158)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.ConjunctionDISI$ConjunctionTwoPhaseIterator.matches(ConjunctionDISI.java:346)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:281)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:45)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:542)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:482)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:398)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda$8120/0x00007fde69629c58.call(Unknown Source)
       java.base@20.0.2/java.util.concurrent.FutureTask.run(FutureTask.java:317)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@20.0.2/java.lang.Thread.runWith(Thread.java:1636)
       java.base@20.0.2/java.lang.Thread.run(Thread.java:1623)
     2/10 snapshots sharing following 22 elements
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:158)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractStringScriptFieldQuery.matches(AbstractStringScriptFieldQuery.java:27)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractStringScriptFieldQuery.matches(AbstractStringScriptFieldQuery.java:19)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.runtime.AbstractScriptFieldQuery$1$1.matches(AbstractScriptFieldQuery.java:77)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.DisjunctionScorer$TwoPhase.matches(DisjunctionScorer.java:158)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.ConjunctionDISI$ConjunctionTwoPhaseIterator.matches(ConjunctionDISI.java:346)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:281)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:45)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:542)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:482)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:398)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda$8120/0x00007fde69629c58.call(Unknown Source)
       java.base@20.0.2/java.util.concurrent.FutureTask.run(FutureTask.java:317)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@20.0.2/java.lang.Thread.runWith(Thread.java:1636)
       java.base@20.0.2/java.lang.Thread.run(Thread.java:1623)
     2/10 snapshots sharing following 17 elements
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.ConjunctionDISI$ConjunctionTwoPhaseIterator.matches(ConjunctionDISI.java:346)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:281)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:45)
       app/org.apache.lucene.core@9.7.0/org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:542)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:482)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:398)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda$8120/0x00007fde69629c58.call(Unknown Source)
       java.base@20.0.2/java.util.concurrent.FutureTask.run(FutureTask.java:317)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
       app/org.elasticsearch.server@8.10.2/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@20.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@20.0.2/java.lang.Thread.runWith(Thread.java:1636)
       java.base@20.0.2/java.lang.Thread.run(Thread.java:1623)

The search brought more results, but it would be very large.

Yes, I wanted to say rollover, I'm sorry.

Not sure What exactly you mean by this...

But if I'm reading the hot threads correct above, your search is what's taking 97% of your CPU?.

You might want to run your searches through the profile or something.

I'm not sure what the relationship to ILM is or not....

It's possible after you re-indexed that there are many more segments than there were on your original indexes which can make searches less efficient.

Once a indexes are rolled over, you can tell it to be merged. This will make reading from it more efficient.

Also very searches that return very large document. Content can also drive up the RAM which then needs to be cleaned up...

If that's your case. You might need more RAM or you might need or want to only return the fields that you're interested in.

Just a couple of thoughts

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.