Elasticsearch using too much memory

Hello,
Originally the ELK stack was working great but after several months of collecting logs, Kibana reports are failing to run properly and it appears due to Elasticsearch memory issues. At least for the past week the VIRT column of TOP reports Elastic search at 238G ot 240G. There is only 8G of physical memory. Is there any settings I have that would be causing this issue? Any suggestions on how to resolve this issue.

We have 1 server.
Running Elk 7.6.0-1
CenOS 8
8 Gig of Ram
8 CPUs
700Gig free
500 Gigs of logs in Elastic search

IP below in the config has been changed to 127.0.0.1

TOP looks like


top - 13:37:28 up 2 days, 23:28,  2 users,  load average: 3.25, 3.00, 3.10
Tasks: 354 total,   1 running, 353 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.1 us,  1.0 sy, 47.3 ni, 43.7 id,  0.4 wa,  0.3 hi,  0.1 si,  0.0 st
MiB Mem :   7767.2 total,    129.6 free,   5677.8 used,   1959.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1352.7 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
39261 logstash  39  19 3535108 425256  19560 S 383.8   5.3   0:42.02 java
38613 elastic+  20   0  238.0g   4.0g 443280 S  57.6  52.3  35:56.69 java
17236 root      20   0  204228  22920    948 S   0.7   0.3   5:59.65 sssd_kcm
38832 kibana    20   0 1682176 307852  15696 S   0.7   3.9   1:03.37 node
 2851 chrony    20   0   29460    288      0 S   0.3   0.0   4:41.74 chronyd
 7292 g305557   20   0  473956   1772      0 S   0.3   0.0  46:53.01 gsd-smartcard
38595 root       0 -20       0      0      0 I   0.3   0.0   0:10.95 kworker/u132:1-kcryptd/253:2
39214 root       0 -20       0      0      0 I   0.3   0.0   0:00.14 kworker/u133:4-kcryptd/253:5

Elasticsearch config

cluster.name: DMCluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: a-log-01
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
path.repo: ["/var/elkbackup/"]

# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 127.0.0.1
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["127.0.0.1"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["a-log-01"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

jvm.options config below. I

###############################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms3g
-Xmx3g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=/var/lib/elasticsearch

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log

## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m

Kibana error is below

{"statusCode":500,"error":"Internal Server Error","message":"[parent] Data too large, data for [<http_request>] would be [3101900416/2.8gb], which is larger than the limit of [2993920409/2.7gb], real usage: [3101900416/2.8gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=22866/22.3kb, in_flight_requests=0/0b, accounting=613438772/585mb]: [circuit_breaking_exception] [parent] Data too large, data for [<http_request>] would be [3101900416/2.8gb], which is larger than the limit of [2993920409/2.7gb], real usage: [3101900416/2.8gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=22866/22.3kb, in_flight_requests=0/0b, accounting=613438772/585mb], with { bytes_wanted=3101900416 & bytes_limit=2993920409 & durability=\"PERMANENT\" }"}

I got that today just by going to the Kibana Homepage.

http://127.0.0.1:5601/app/kibana

VIRT memory is managed by the OS (the filesystem cache part mentioned in the docs, it's not something you need to worry about.

As long as Elasticsearch isn't OOMing then it's sticking inside its heap use without problems.

Your CPU use seems pretty high though, what does your GC patterns look like?

I am fairly new to ELK. By GC you mean Garbage Collection? If so I have done no special configurations for GC. Is there a recommended article in the tutorials for GC?

I would also be happy to share any configs.

Yes, garbage collection. If you head over to the Monitoring section in Kibana you can see some graphs that show this.

Alternatively your Elasticsearch logs will mention it as well.

I don't see any GC in the Monitoring section.

Check out https://www.elastic.co/guide/en/kibana/current/elasticsearch-metrics.html.

Is there another way? As mentioned in the original post, Kibana is failing a lot so I can no longer bring up the monitoring page. The screenshot I took was me waiting 45 minutes for it to load.

Your heap usage sits at nearly 95% so you have a heap pressure issue. I would recommend adding RAM and heap to resolve this, but you could also try deleting some data.

Yeah, isn't he just hitting a request breaker, implying the query is too big - maybe the default/current Kibana time window is too large (try 15min) or otherwise querying a huge result set? I suspect the 'reports' get larger & larger and just run out of heap, so as noted, a does of RAM would help; a nice 16GB VM with 8GB of heap would likely be happier (and have a lot more cache as we only have about 2GB for that now; reports won't like that, either).

Thank you for the answers I was able to scrounge another 64Gigs of Memory. I increased the heap to 30Gigs.

Now Kibana reports are getting Error 301 which seems to be a reverse proxy issue. I am working on that part. Performance is better.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.