Multiple Out Of Memory Errors occurring, sometimes causing Cluster State Red Alerts

We are getting many Out Of Memory errors on one cluster, but other clusters with similar size are not facing the issue. All the errors are of same type.

[2023-06-01T11:30:41,368][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [**] fatal error in thread [elasticsearch[***][write][T#4]], exiting
**java.lang.OutOfMemoryError: null**
    at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) ~[?:1.8.0_212]
    at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) ~[?:1.8.0_212]
    at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) ~[?:1.8.0_212]
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) ~[?:1.8.0_212]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2137) ~[jackson-core-2.10.4.jar:2.10.4]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._writeStringSegment2(UTF8JsonGenerator.java:1451) ~[jackson-core-2.10.4.jar:2.10.4]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._writeStringSegment(UTF8JsonGenerator.java:1398) ~[jackson-core-2.10.4.jar:2.10.4]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._writeStringSegments(UTF8JsonGenerator.java:1281) ~[jackson-core-2.10.4.jar:2.10.4]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator.writeString(UTF8JsonGenerator.java:502) ~[jackson-core-2.10.4.jar:2.10.4]
    at org.elasticsearch.xcontent.json.JsonXContentGenerator.writeString(JsonXContentGenerator.java:271) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.value(XContentBuilder.java:667) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.lambda$static$14(XContentBuilder.java:96) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.unknownValue(XContentBuilder.java:822) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.value(XContentBuilder.java:1009) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.unknownValue(XContentBuilder.java:831) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.map(XContentBuilder.java:980) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.unknownValue(XContentBuilder.java:829) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.map(XContentBuilder.java:980) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.map(XContentBuilder.java:929) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:452) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.update.UpdateHelper.prepareUpdateScriptRequest(UpdateHelper.java:270) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:82) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:63) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:267) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:181) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:245) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:134) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:74) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:196) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.17.5.jar:7.17.5]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

Any suggestions will be really helpful !

You need to provide more information.

What is the specs of the nodes? What is the configured java heap? Do you run anything else on those nodes beside elasticsearch?

Thanks @leandrojmp for quick response. The nodes are running with 112 GB RAM[increased from 56 GB since issue began]. They are running only Elastic Search.
Total Nodes: 26 (we increased from 20 to 26)
[Note: The cluster has about 124 data nodes but issue is happening for these 26 nodes which hosts certain indices. The other ones running different indices are doing fine]

JVM Heap: before the issue began, we had 28 GB of heap size, which we increased to 31 GB. However, as we don't have a lot of insights on the issue we have changed the Heap size to 57 GB (about 50% of RAM). We understand that this is much higher than the recommended 31-32 GB. However, we are trying to verify if this helps to mitigate the issue.
CPU consumption on the nodes is fairly less, hardly goes above 10-20%

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.