ES going through slow gc

Hi all,

I am new to ES so my question is probably stupid.

I am running ES 2.0.0 beta1 on my mac for testing. My setup is:

  • heap size = 6g
  • ulimit = unlimited
  • mlockall...... fail to have it true, not sure why
  • one node
  • connect through transport client
  • one index (I use elastic-gremlin to store graph data into ES and the index represents the graph)
  • everything else like shards, replica is default.

Loading a small graph is fine but when I try a large one I start getting slow gc coming up in the Elasticsearch console while it is loading the graph (adding vertices and edges):

[2015-09-25 12:13:24,432][INFO ][org.elasticsearch.monitor.jvm] [Mad Thinker] [gc][old][218][7] duration [8.8s], collections [1]/[9s], total [8.8s]/[1.2m], memory [5.9gb]->[5.8gb]/[5.9gb], all_pools {[young] [266.2mb]->[146.2mb]/[266.2mb]}{[survivor] [3.5mb]->[0b]/[33.2mb]}{[old] [5.6gb]->[5.6gb]/[5.6gb]}
[2015-09-25 12:19:33,024][INFO ][org.elasticsearch.monitor.jvm] [Mad Thinker] [gc][old][504][51] duration [7.7s], collections [1]/[7.8s], total [7.7s]/[1.6m], memory [5.9gb]->[5.8gb]/[5.9gb], all_pools {[young] [266.2mb]->[145.4mb]/[266.2mb]}{[survivor] [32.1mb]->[0b]/[33.2mb]}{[old] [5.6gb]->[5.6gb]/[5.6gb]}

Then the client starts having trouble and gives NoNodeAvailableException repeatedly.

I took a heap dump and used Eclipse MAT. It says:

One instance of "org.elasticsearch.search.SearchService" loaded by "sun.misc.Launcher$AppClassLoader @ 0x654cc0700" occupies 5,791,739,856 (92.91%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Node[]" loaded by "<system class loader>".

If I look into it I can see a lot of ConcurrentHashMapNode and each node has this pageCacheRecycler which contains Recyclers. When I go deeper I can see the elements within DequeRecycler which looks like the vertex and edge I am adding.

I am not sure what the recycler is for, it will be good to understand what ES is doing there. And does this simply means I am just pushing data too quickly to ES? or did I miss any important configurations? or maybe a bug in the code?

One funny thing I found is if I reduce the number of shards to 1 instead of the default 5 then I don't see the slow gc but it takes even longer to finish loading the graph. Looks like sharding improves the speed but uses way more memory?

Please provide guidance to help me understand and find the root cause.

Thank you very much in advance, I greatly appreciate any help from you guys.

Regards,
Andy

What I can tell you is that a shard is a lucene instance, that requires resources to run and maintain. So having less shards "wastes" less resources on this, which is important on a single node.

I don't know enough about the rest though sorry, maybe a core dev will drop in :slight_smile: