ES going through slow gc

guiltyxsin · September 25, 2015, 4:43am

Hi all,

I am new to ES so my question is probably stupid.

I am running ES 2.0.0 beta1 on my mac for testing. My setup is:

heap size = 6g
ulimit = unlimited
mlockall...... fail to have it true, not sure why
one node
connect through transport client
one index (I use elastic-gremlin to store graph data into ES and the index represents the graph)
everything else like shards, replica is default.

Loading a small graph is fine but when I try a large one I start getting slow gc coming up in the Elasticsearch console while it is loading the graph (adding vertices and edges):

[2015-09-25 12:13:24,432][INFO ][org.elasticsearch.monitor.jvm] [Mad Thinker] [gc][old][218][7] duration [8.8s], collections [1]/[9s], total [8.8s]/[1.2m], memory [5.9gb]->[5.8gb]/[5.9gb], all_pools {[young] [266.2mb]->[146.2mb]/[266.2mb]}{[survivor] [3.5mb]->[0b]/[33.2mb]}{[old] [5.6gb]->[5.6gb]/[5.6gb]}
[2015-09-25 12:19:33,024][INFO ][org.elasticsearch.monitor.jvm] [Mad Thinker] [gc][old][504][51] duration [7.7s], collections [1]/[7.8s], total [7.7s]/[1.6m], memory [5.9gb]->[5.8gb]/[5.9gb], all_pools {[young] [266.2mb]->[145.4mb]/[266.2mb]}{[survivor] [32.1mb]->[0b]/[33.2mb]}{[old] [5.6gb]->[5.6gb]/[5.6gb]}

Then the client starts having trouble and gives NoNodeAvailableException repeatedly.

I took a heap dump and used Eclipse MAT. It says:

One instance of "org.elasticsearch.search.SearchService" loaded by "sun.misc.Launcher$AppClassLoader @ 0x654cc0700" occupies 5,791,739,856 (92.91%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Node[]" loaded by "<system class loader>".

If I look into it I can see a lot of ConcurrentHashMapNode and each node has this pageCacheRecycler which contains Recyclers. When I go deeper I can see the elements within DequeRecycler which looks like the vertex and edge I am adding.

I am not sure what the recycler is for, it will be good to understand what ES is doing there. And does this simply means I am just pushing data too quickly to ES? or did I miss any important configurations? or maybe a bug in the code?

One funny thing I found is if I reduce the number of shards to 1 instead of the default 5 then I don't see the slow gc but it takes even longer to finish loading the graph. Looks like sharding improves the speed but uses way more memory?

Please provide guidance to help me understand and find the root cause.

Thank you very much in advance, I greatly appreciate any help from you guys.

Regards,
Andy

warkolm · September 25, 2015, 5:22am

What I can tell you is that a shard is a lucene instance, that requires resources to run and maintain. So having less shards "wastes" less resources on this, which is important on a single node.

I don't know enough about the rest though sorry, maybe a core dev will drop in

Topic		Replies	Views
ES - Peformance GC Elasticsearch	1	290	July 6, 2017
Elasticsearch gc overhead Elasticsearch	1	1264	March 23, 2020
Stop-the-world slow GC's all the time [Production] Elasticsearch	3	1069	July 5, 2017
Production cluster slows down after 15-20 days of starting the services Elasticsearch	8	955	July 5, 2017
Elasticsearch High CPU Usage - GC Not Working Elasticsearch	26	7051	July 5, 2017

ES going through slow gc

Related topics