Possible memory leak in elasticsearch 6.2.4

imperio-wxm · December 25, 2019, 2:41am

I have a cluster for 3 nodes.

index number：2816
shards： 24893
docs： 1240277375
disk use: 477G

// jvm ops
-Xms8g
-Xmx8g

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# explicitly set the stack size
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# turn off a JDK optimization that throws away stack traces for common
# exceptions because stack traces are important for debugging
-XX:-OmitStackTraceInFastThrow

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true

-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=/heap/dump/path

## JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/app/rt/elasticsearch/var/logs/elasticsearch-gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=10
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/app/rt/elasticsearch/var/logs/elasticsearch-gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT

Found that old gc does not free up lots of space in the old generation, large objects always exist in the heap.

I know the jvm memory of es is divided into these parts：

Query cache
Index buffer
Request cache
Field data cache
Lecence memory

However, Query cache（50MB） + Index buffer（120MB） + Request cache（50MB） + Field data cache（200KB）+ Lecence memory（460MB） = 680MB，heap use 6G，Where is the remaining 5G memory used?

And I get jvm dump and analyze, found out that netty takes up a lot of memory.

Provide complete analysis files if necessary.

Armin_Braun · December 25, 2019, 9:27am

Hi @imperio-wxm

This is expected. Netty pools memory so that it does not have to allocate new buffers for ever network read. Depending on the network IO load on an ES node this can take up a significant amount of memory, but I would not say it qualifies as a leak since that memory use does not grow boundlessly over time but rather grows and shrinks in relation to the load on the node.
Since the buffers are pooled and hence long-lived means they count towards old-gen memory usage. You can technically disable the buffer pooling and get rid of these long lived old-gen buffers by enabling the Netty unpooled allocator but it's most likely going to result in much higher GC load and lower overall performance.

Hope that helps

warkolm · December 26, 2019, 7:23am

You have wwwwwaaaaayyyyyy too many shards for that size of data.

system · January 23, 2020, 7:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 1.4.2 JVM memory leak? Elasticsearch	2	612	July 6, 2017
Suspicious memory leak due to netty PoolThreadCache Elasticsearch	8	1999	February 12, 2019
Heap memory leak in Elasticsearch 6.2.4 Elasticsearch	5	1909	March 3, 2020
Memory problems during data index Elasticsearch	13	1559	July 6, 2017
Memory "leak" like behaviour in ES Elasticsearch	2	322	July 6, 2017

Possible memory leak in elasticsearch 6.2.4

Related topics