Multiple Out Of Memory Errors occurring, sometimes causing Cluster State Red Alerts

Sarit_Ghosh · June 1, 2023, 1:35pm

We are getting many Out Of Memory errors on one cluster, but other clusters with similar size are not facing the issue. All the errors are of same type.

[2023-06-01T11:30:41,368][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [**] fatal error in thread [elasticsearch[***][write][T#4]], exiting
**java.lang.OutOfMemoryError: null**
    at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) ~[?:1.8.0_212]
    at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) ~[?:1.8.0_212]
    at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) ~[?:1.8.0_212]
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) ~[?:1.8.0_212]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2137) ~[jackson-core-2.10.4.jar:2.10.4]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._writeStringSegment2(UTF8JsonGenerator.java:1451) ~[jackson-core-2.10.4.jar:2.10.4]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._writeStringSegment(UTF8JsonGenerator.java:1398) ~[jackson-core-2.10.4.jar:2.10.4]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._writeStringSegments(UTF8JsonGenerator.java:1281) ~[jackson-core-2.10.4.jar:2.10.4]
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator.writeString(UTF8JsonGenerator.java:502) ~[jackson-core-2.10.4.jar:2.10.4]
    at org.elasticsearch.xcontent.json.JsonXContentGenerator.writeString(JsonXContentGenerator.java:271) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.value(XContentBuilder.java:667) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.lambda$static$14(XContentBuilder.java:96) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.unknownValue(XContentBuilder.java:822) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.value(XContentBuilder.java:1009) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.unknownValue(XContentBuilder.java:831) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.map(XContentBuilder.java:980) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.unknownValue(XContentBuilder.java:829) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.map(XContentBuilder.java:980) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.xcontent.XContentBuilder.map(XContentBuilder.java:929) ~[elasticsearch-x-content-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:452) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.update.UpdateHelper.prepareUpdateScriptRequest(UpdateHelper.java:270) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:82) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:63) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:267) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:181) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:245) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:134) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:74) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:196) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777) ~[elasticsearch-7.17.5.jar:7.17.5]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.17.5.jar:7.17.5]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

Any suggestions will be really helpful !

leandrojmp · June 1, 2023, 1:39pm

You need to provide more information.

What is the specs of the nodes? What is the configured java heap? Do you run anything else on those nodes beside elasticsearch?

Sarit_Ghosh · June 2, 2023, 4:03am

Thanks @leandrojmp for quick response. The nodes are running with 112 GB RAM[increased from 56 GB since issue began]. They are running only Elastic Search.
Total Nodes: 26 (we increased from 20 to 26)
[Note: The cluster has about 124 data nodes but issue is happening for these 26 nodes which hosts certain indices. The other ones running different indices are doing fine]

JVM Heap: before the issue began, we had 28 GB of heap size, which we increased to 31 GB. However, as we don't have a lot of insights on the issue we have changed the Heap size to 57 GB (about 50% of RAM). We understand that this is much higher than the recommended 31-32 GB. However, we are trying to verify if this helps to mitigate the issue.
CPU consumption on the nodes is fairly less, hardly goes above 10-20%

system · June 30, 2023, 4:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Cluster data node comes out of cluster frequently Elasticsearch	1	383	May 3, 2018
Elasticsearch cluster down due to Elasticsearch:java.lang.OutOfMemoryError: Java heap space Elasticsearch	4	407	June 25, 2019
Out of memory of data nodes Elasticsearch	5	1258	February 23, 2018
Elastic Search-Java.lang.OutOfMemoryError Elasticsearch	2	1825	October 15, 2018
Elasticsearch (6.4.1) - JVM OutOfMemoryError Elasticsearch	5	1038	June 26, 2019

Multiple Out Of Memory Errors occurring, sometimes causing Cluster State Red Alerts

Related topics