Heap issue - OutofMemory

With the latest version of Elasticsearch, the HEAP is automatically defined by Elasticsearch... So I upgraded my elk system from 7.7.0 to 7.15.2 with rpm
I tried to leave jvm.options config file as a default (i mean, I did not set the -Xms4g and -Xmx4g, max and min are commented with #) But I faced many problems on my cluster. Dropped some nodes from the cluster, unassigned shards, etc. After I set the min and max value as a %50 of Total Memory in jvm.options then the cluster started to work.

I am still not sure whether I was able to enable the auto heap defining feature of Elasticsearch. How should I configure the jvm.option to enable the auto heap sizing feature of Elasticsearch? Should I set Xms and Xmx to enable the auto heap feature or I should leave it as a deafult ? We want to test this feature because we are constantly encountering heap size problems like "data too large..." or "OutOfMemoryError" that's why I want to test the auto heap sizing feature. But how?

After upgrading to the latest version(7.15.2), we have some problems as below;

elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-12-10 23:01:49 AEDT; 1h 20min ago
     Docs: https://www.elastic.co
 Main PID: 28195 (java)
    Tasks: 31 (limit: 26213)
   Memory: 6.7G
   CGroup: /system.slice/elasticsearch.service
           ├─28195 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF->
           └─28390 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Dec 10 23:53:56 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][generic][T#1]"
Dec 10 23:54:08 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[watcher-flush-scheduler][T#1]"
Dec 10 23:54:34 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][generic][T#4]"
Dec 10 23:57:07 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][scheduler][T#1]"
Dec 10 23:58:03 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[deprecation-indexing-flush-scheduler][T#1]"
Dec 10 23:59:19 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[ilm-history-store-flush-scheduler][T#1]"
Dec 11 00:00:10 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][management][T#2]"
Dec 11 00:01:01 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][generic][T#5]"
Dec 11 00:11:11 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][transport_worker][T#2]"
Dec 11 00:14:09 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][transport_worker][T#1]"

This is an error that we are constantly encountering;

[2021-12-10T17:05:49,087][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [cul-elk-cold-1] attempting to trigger G1GC due to high heap usage [8565115672]
[2021-12-10T17:06:00,866][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [cul-elk-cold-1] GC did not bring memory usage down, before [8565115672], after [8571679384], allocations [1], duration [10975]
[2021-12-10T17:06:37,155][WARN ][o.e.h.AbstractHttpServerTransport] [cul-elk-cold-1] handling request [null][POST][/_msearch?max_concurrent_shard_requests=5][Netty4HttpChannel{localAddress=/10.234.11.85:9200, remoteAddress=/172.27.182.6:44840}] took [615667ms] which is above the warn threshold of [5000ms]
[2021-12-10T17:07:04,431][WARN ][o.e.h.AbstractHttpServerTransport] [cul-elk-cold-1] handling request [null][POST][/_msearch?max_concurrent_shard_requests=5][Netty4HttpChannel{localAddress=/10.234.11.85:9200, remoteAddress=/172.27.182.6:44810}] took [615870ms] which is above the warn threshold of [5000ms]
[2021-12-10T17:07:04,432][ERROR][o.e.ExceptionsHelper     ] [cul-elk-cold-1] fatal error
at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$4(ExceptionsHelper.java:283)
        at java.base/java.util.Optional.ifPresent(Optional.java:178)
        at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:273)
        at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.exceptionCaught(Netty4HttpRequestHandler.java:42)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:381)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:47)

also;

Elasticsearch error: [parent] Data too large, data for [<http_request>] would be [8348759604/7.7gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8348758600/7.7gb], new bytes reserved: [1004/1004b], usages [request=0/0b, fielddata=11501482/10.9mb, in_flight_requests=83088330/79.2mb, model_inference=0/0b, eql_sequence=0/0b, accounting=165562876/157.8mb]

My cluster's java info;

java -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

Host Memory info;

free -m
              total        used        free      shared  buff/cache   available
Mem:           7812        4846         614          26        2351        2647
Swap:          8103          96        8007

Cluster Info;

{
  "cluster_name" : "XXXXXXX",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 9,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 2474,
  "active_shards" : 4948,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Version: 7.15.2

Elasticsearch.yml;

path.data: /mnt/data/elasticsearch/data
path.logs: /mnt/data/elasticsearch/logs
path.repo: /mnt/data/es_backup

jvm.options;

cat /etc/elasticsearch/jvm.options
## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms4g
-Xmx4g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=/var/lib/elasticsearch

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log

## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m

Which java version should I use? Should I upgrade to my current java version or the current (openjdk version "1.8.0_242") is enough?
How much RAM should I use for production? How can I calculate it?

This is a production site. We need your help urgently. Many thanks.

1 Like

I need your suggestion. Please help :pray:

Why didn't you take my question seriously?

I'm not sure why auto heap defining feature caused problems, but one thing I can say is that heap memory should be much more.
As less than 20 shards / 1gb heap memory is recommended, 8 data nodes with 64gb physical memory (32gb heap memory) are recommended for 5000 shards. The cluster may function with less memory, but your memory setting is quite less than that.

I encountered "data too large..." or "OutOfMemoryError" errors when using small memory. It caused by last one push against depleting resource.

You have 6 data nodes, if all your data nodes have the same configuration, which means, 8 GB of RAM, then you can have a maximum of 4 GB of HEAP for every node.

The recommendation is to have a maximum of 20 shards per 1 GB of RAM, a node with 4 GB of heap should have a maximum of 80 Shards.

You have almost 5000 shards, if they are equally balanced, this will give more than 800 shards per node, which is 10 times more than the recommended value for a 4 GB HEAP node.

This could be the cause of your OutOfMemory issues.

You should increase the memory of your nodes and also try to reduce the number of shards.

Take a look at this blog post with tips on how to deal with your shard numbers.

4 Likes

Thank you so much for your reply @leandrojmp and @Tomo_M

Our data nodes don't have the same configuration. Their total RAM is like this;

cold-1 -> 8gb
cold-2 -> 8gb
hot-1 -> 16gb
hot-2 -> 16gb
warm-1 -> 16gb
warm-2 -> 16gb

And their heap size is half of the total RAM. First, our plan will be as indicated below;

  1. Delete some indices older than 6 months with the curator
  2. Re-indexing for some indices with API
  3. We will change some index creation from daily to weekly or monthly and yearly
  4. All data will come to nodes via Logstash. Not to a node directly.
  5. We will enable the auto heap feature by commanding the -Xms4g and -Xmx4g lines with "#" in jvm.options config file.

How does auto heap work? For example, you have 8 GB total RAM in a node. Auto heap will increase it until 4 GB if needed? if so, we don't need to define heap size manually in jvm.options?
which one is more effective against the memory issue? auto heap or set heap manuel?

@leandrojmp ?

That sounds good to me.

So autoheap will depending on your node role set the heap size accordingly. E.g. for a 16GB hot node it will take 8GB as JVM heap. However it does not care if you have other things running as well that decreases that 8GB availability.

  1. We will change some index creation from daily to weekly or monthly and yearly

That is an old behaviour and should be decomissioned. Change it to an alias, use ILM to manage and automate the rollover. Furthermore, there is no need to use curator anymore ILM can take care of most things.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.