Possible memory leak in elasticsearch 6.2.4

I have a cluster for 3 nodes.

index number:2816
shards: 24893
docs: 1240277375
disk use: 477G

// jvm ops
-Xms8g
-Xmx8g

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# explicitly set the stack size
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# turn off a JDK optimization that throws away stack traces for common
# exceptions because stack traces are important for debugging
-XX:-OmitStackTraceInFastThrow

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true

-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=/heap/dump/path

## JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/app/rt/elasticsearch/var/logs/elasticsearch-gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=10
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/app/rt/elasticsearch/var/logs/elasticsearch-gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT

Found that old gc does not free up lots of space in the old generation, large objects always exist in the heap.

I know the jvm memory of es is divided into these parts:

  1. Query cache
  2. Index buffer
  3. Request cache
  4. Field data cache
  5. Lecence memory

However, Query cache(50MB) + Index buffer(120MB) + Request cache(50MB) + Field data cache(200KB)+ Lecence memory(460MB) = 680MB,heap use 6G,Where is the remaining 5G memory used?

And I get jvm dump and analyze, found out that netty takes up a lot of memory.

Provide complete analysis files if necessary.

Hi @imperio-wxm

This is expected. Netty pools memory so that it does not have to allocate new buffers for ever network read. Depending on the network IO load on an ES node this can take up a significant amount of memory, but I would not say it qualifies as a leak since that memory use does not grow boundlessly over time but rather grows and shrinks in relation to the load on the node.
Since the buffers are pooled and hence long-lived means they count towards old-gen memory usage. You can technically disable the buffer pooling and get rid of these long lived old-gen buffers by enabling the Netty unpooled allocator but it's most likely going to result in much higher GC load and lower overall performance.

Hope that helps :slight_smile:

You have wwwwwaaaaayyyyyy too many shards for that size of data.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.