With the latest version of Elasticsearch, the HEAP is automatically defined by Elasticsearch... So I upgraded my elk system from 7.7.0 to 7.15.2 with rpm
I tried to leave jvm.options config file as a default (i mean, I did not set the -Xms4g and -Xmx4g, max and min are commented with #) But I faced many problems on my cluster. Dropped some nodes from the cluster, unassigned shards, etc. After I set the min and max value as a %50 of Total Memory in jvm.options then the cluster started to work.
I am still not sure whether I was able to enable the auto heap defining feature of Elasticsearch. How should I configure the jvm.option to enable the auto heap sizing feature of Elasticsearch? Should I set Xms and Xmx to enable the auto heap feature or I should leave it as a deafult ? We want to test this feature because we are constantly encountering heap size problems like "data too large..." or "OutOfMemoryError" that's why I want to test the auto heap sizing feature. But how?
After upgrading to the latest version(7.15.2), we have some problems as below;
elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-12-10 23:01:49 AEDT; 1h 20min ago
Docs: https://www.elastic.co
Main PID: 28195 (java)
Tasks: 31 (limit: 26213)
Memory: 6.7G
CGroup: /system.slice/elasticsearch.service
├─28195 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF->
└─28390 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
Dec 10 23:53:56 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][generic][T#1]"
Dec 10 23:54:08 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[watcher-flush-scheduler][T#1]"
Dec 10 23:54:34 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][generic][T#4]"
Dec 10 23:57:07 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][scheduler][T#1]"
Dec 10 23:58:03 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[deprecation-indexing-flush-scheduler][T#1]"
Dec 10 23:59:19 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[ilm-history-store-flush-scheduler][T#1]"
Dec 11 00:00:10 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][management][T#2]"
Dec 11 00:01:01 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][generic][T#5]"
Dec 11 00:11:11 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][transport_worker][T#2]"
Dec 11 00:14:09 syd-elk-mastr-3 systemd-entrypoint[28195]: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "elasticsearch[syd-elk-mastr-3][transport_worker][T#1]"
This is an error that we are constantly encountering;
[2021-12-10T17:05:49,087][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [cul-elk-cold-1] attempting to trigger G1GC due to high heap usage [8565115672]
[2021-12-10T17:06:00,866][INFO ][o.e.i.b.HierarchyCircuitBreakerService] [cul-elk-cold-1] GC did not bring memory usage down, before [8565115672], after [8571679384], allocations [1], duration [10975]
[2021-12-10T17:06:37,155][WARN ][o.e.h.AbstractHttpServerTransport] [cul-elk-cold-1] handling request [null][POST][/_msearch?max_concurrent_shard_requests=5][Netty4HttpChannel{localAddress=/10.234.11.85:9200, remoteAddress=/172.27.182.6:44840}] took [615667ms] which is above the warn threshold of [5000ms]
[2021-12-10T17:07:04,431][WARN ][o.e.h.AbstractHttpServerTransport] [cul-elk-cold-1] handling request [null][POST][/_msearch?max_concurrent_shard_requests=5][Netty4HttpChannel{localAddress=/10.234.11.85:9200, remoteAddress=/172.27.182.6:44810}] took [615870ms] which is above the warn threshold of [5000ms]
[2021-12-10T17:07:04,432][ERROR][o.e.ExceptionsHelper ] [cul-elk-cold-1] fatal error
at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$4(ExceptionsHelper.java:283)
at java.base/java.util.Optional.ifPresent(Optional.java:178)
at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:273)
at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.exceptionCaught(Netty4HttpRequestHandler.java:42)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:381)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:47)
also;
Elasticsearch error: [parent] Data too large, data for [<http_request>] would be [8348759604/7.7gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8348758600/7.7gb], new bytes reserved: [1004/1004b], usages [request=0/0b, fielddata=11501482/10.9mb, in_flight_requests=83088330/79.2mb, model_inference=0/0b, eql_sequence=0/0b, accounting=165562876/157.8mb]
My cluster's java info;
java -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
Host Memory info;
free -m
total used free shared buff/cache available
Mem: 7812 4846 614 26 2351 2647
Swap: 8103 96 8007
Cluster Info;
{
"cluster_name" : "XXXXXXX",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 9,
"number_of_data_nodes" : 6,
"active_primary_shards" : 2474,
"active_shards" : 4948,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
Version: 7.15.2
Elasticsearch.yml;
path.data: /mnt/data/elasticsearch/data
path.logs: /mnt/data/elasticsearch/logs
path.repo: /mnt/data/es_backup
jvm.options;
cat /etc/elasticsearch/jvm.options
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms4g
-Xmx4g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=/var/lib/elasticsearch
# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log
## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m
Which java version should I use? Should I upgrade to my current java version or the current (openjdk version "1.8.0_242") is enough?
How much RAM should I use for production? How can I calculate it?
This is a production site. We need your help urgently. Many thanks.