ElasticSearch 0.90.10 : GC does not free memory

Hi everybody,

I have a problem with the latest version of ElasticSearch (0.90.10).
It seems that whatever I do, after a little while (some minutes) the
garbage collector does not free any memory.

Here are some of the logs:
[2014-01-24 09:52:05,809][WARN ][monitor.jvm ] [Walter White]
[gc][old][806][1] duration [7.2s], collections [1]/[7.4s], total
[7.2s]/[7.2s], memory [9.7gb]->[8.5gb]/[10gb], all_pools {[young]
[440mb]->[0b]/[0b]}{[survivor] [64mb]->[0b]/[0b]}{[old]
[2014-01-24 09:52:52,274][WARN ][monitor.jvm ] [Walter White]
[gc][old][848][2] duration [4.5s], collections [1]/[5.4s], total
[4.5s]/[11.7s], memory [9.4gb]->[9.3gb]/[10gb], all_pools {[young]
[428mb]->[0b]/[0b]}{[survivor] [64mb]->[0b]/[0b]}{[old]
[2014-01-24 09:53:08,454][WARN ][monitor.jvm ] [Walter White]
[gc][old][861][3] duration [4s], collections [1]/[4.1s], total
[4s]/[15.7s], memory [9.6gb]->[9.6gb]/[10gb], all_pools {[young]
[280mb]->[0b]/[0b]}{[survivor] [0b]->[0b]/[0b]}{[old]
[2014-01-24 09:53:12,753][WARN ][monitor.jvm ] [Walter White]
[gc][old][862][4] duration [3.9s], collections [1]/[4.2s], total
[3.9s]/[19.6s], memory [9.6gb]->[9.6gb]/[10gb], all_pools {[young]
[0b]->[0b]/[0b]}{[survivor] [0b]->[0b]/[0b]}{[old] [9.6gb]->[9.6gb]/[10gb]}

We clearly see that the memory goes from 9.6g to ... 9.6g, so the GC did
not release any memory at all.

I have tried to modify all the settings of the garbage collector that I
know of, but nothing seems to work:

  • I tried to use the G1 GC
  • I tried to use ParNewGC and ConcMarkSweepGC with the option
  • I changed the CMSInitiatingOccupancyFraction from 75 to 50 to 60

My configuration is:

  • I have three indexes, and the main one is 200GB large
  • The indexes have 5 shards and 1 replica
  • The 2 nodes are 2 m1.xlarge EC2 instances (4 CPU, 15GB Memory, 420GB hard
    drive each)
  • I use the shared S3 gateway
  • The Java version I use is 1.7.0_51 (I was on 1.7.0_05 before, and I
    updated it to see if the problem could come from that)
  • The nodes are on Ubuntu 12.04
  • The heap size given to ES is now 10G (I tried with 11G and 8GB, but same
    I also tried to launch the cluster on a huge single node (hi1.4xlarge : 16
    CPU, 50GB Memory - with 32GB heap size for ES, 1024GB SSD drive), but same

The ES options I changed are:

  • indices.fielddata.cache.size: 1G (that was for testing, but with or
    without the option, the problem still occurs)
  • bootstrap.mlockall: true

I also changed the threadpool options (because I had some messages telling
me that the queue was full) to:
type: fixed
size: 10
queue_size: -1
But like for the cache size of the field data, with or without this option,
I still have the GC error.

And I also changed the default ports.
But all those things did not change anything, my GC still does not free

So, what can I do to solve this problem?

Here is a screenshot I got with bigdesk:



