Best way to resolve this out of memory error in my ES Data Nodes

I have a cluster and now my data nodes are crashing. Looking over the logs i see this error. Its an out of memory error. Is there something i should set in the Elastic yml file to help lock memory? Add more data nodes? Theres alot of data in these nodes, but it just started suddenly doing this. Any insight would be GREATLY appricated.

Thanks,

[2021-02-05T18:21:23,216][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ITSSD5ESDN1] fatal error in thread [elasticsearch[ITSSD5ESDN1][search][T#8]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.FixedBitSet.(FixedBitSet.java:115) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.LRUQueryCache.cacheIntoBitSet(LRUQueryCache.java:511) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.LRUQueryCache.cacheImpl(LRUQueryCache.java:504) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.cache(LRUQueryCache.java:708) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorerSupplier(LRUQueryCache.java:743) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.scorerSupplier(IndicesQueryCache.java:162) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:160) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:810) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.bulkScorer(IndicesQueryCache.java:168) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:665) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:388) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:108) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:248) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.5.1.jar:5.5.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_201]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_201]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
[2021-02-05T18:21:32,595][INFO ][o.e.m.j.JvmGcMonitorService] [ITSSD5ESDN1] [gc][old][5599][757] duration [30.6s], collections [4]/[30.8s], total [30.6s]/[23.3m], memory [27.9gb]->[27.9gb]/[27.9gb], all_pools {[young] [532.5mb]->[532.5m$
[2021-02-05T18:21:32,595][WARN ][o.e.m.j.JvmGcMonitorService] [ITSSD5ESDN1] [gc][5599] overhead, spent [30.6s] collecting in the last [30.8s]

That depends on a lot of things. Please have a look into this and choose whatever is applicable in your scenario.

At a guess, given it looks like you're on 5.X, you have too many shards.

What is the output from the _cluster/stats?pretty&human API?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.