ElasticSearch 0.90.10 : Garbage collector does not free memory


(Talal-2) #1

Hi everybody,

I have a problem with the latest version of ElasticSearch (0.90.10).
It seems that whatever I do, after a little while (some minutes) the
garbage collector does not free any memory.

Here are some of the logs:
[2014-01-24 09:52:05,809][WARN ][monitor.jvm ] [Walter White]
[gc][old][806][1] duration [7.2s], collections [1]/[7.4s], total
[7.2s]/[7.2s], memory [9.7gb]->[8.5gb]/[10gb], all_pools {[young]
[440mb]->[0b]/[0b]}{[survivor] [64mb]->[0b]/[0b]}{[old]
[9.3gb]->[8.5gb]/[10gb]}
[2014-01-24 09:52:52,274][WARN ][monitor.jvm ] [Walter White]
[gc][old][848][2] duration [4.5s], collections [1]/[5.4s], total
[4.5s]/[11.7s], memory [9.4gb]->[9.3gb]/[10gb], all_pools {[young]
[428mb]->[0b]/[0b]}{[survivor] [64mb]->[0b]/[0b]}{[old]
[8.9gb]->[9.3gb]/[10gb]}
[2014-01-24 09:53:08,454][WARN ][monitor.jvm ] [Walter White]
[gc][old][861][3] duration [4s], collections [1]/[4.1s], total
[4s]/[15.7s], memory [9.6gb]->[9.6gb]/[10gb], all_pools {[young]
[280mb]->[0b]/[0b]}{[survivor] [0b]->[0b]/[0b]}{[old]
[9.3gb]->[9.6gb]/[10gb]}
[2014-01-24 09:53:12,753][WARN ][monitor.jvm ] [Walter White]
[gc][old][862][4] duration [3.9s], collections [1]/[4.2s], total
[3.9s]/[19.6s], memory [9.6gb]->[9.6gb]/[10gb], all_pools {[young]
[0b]->[0b]/[0b]}{[survivor] [0b]->[0b]/[0b]}{[old] [9.6gb]->[9.6gb]/[10gb]}

We clearly see that the memory goes from 9.6g to ... 9.6g, so the GC did
not release any memory at all.

I have tried to modify all the settings of the garbage collector that I
know of, but nothing seems to work:

  • I tried to use the G1 GC
  • I tried to use ParNewGC and ConcMarkSweepGC with the option
    CMSInitiatingOccupancyOnly
  • I changed the CMSInitiatingOccupancyFraction from 75 to 50 to 60

My configuration is:

  • I have three indexes, and the main one is 200GB large
  • The indexes have 5 shards and 1 replica
  • The 2 nodes are 2 m1.xlarge EC2 instances (4 CPU, 15GB Memory, 420GB hard
    drive each)
  • I use the shared S3 gateway
  • The Java version I use is 1.7.0_51 (I was on 1.7.0_05 before, and I
    updated it to see if the problem could come from that)
  • The nodes are on Ubuntu 12.04
  • The heap size given to ES is now 10G (I tried with 11G and 8GB, but same
    problem).
    I also tried to launch the cluster on a huge single node (hi1.4xlarge : 16
    CPU, 50GB Memory - with 32GB heap size for ES, 1024GB SSD drive), but same
    problem.

The ES options I changed are:

  • indices.fielddata.cache.size: 1G (that was for testing, but with or
    without the option, the problem still occurs)
  • bootstrap.mlockall: true

I also changed the threadpool options (because I had some messages telling
me that the queue was full) to:
threadpool:
search:
type: fixed
size: 10
queue_size: -1
But like for the cache size of the field data, with or without this option,
I still have the GC error.

And I also changed the default ports.
But all those things did not change anything, my GC still does not free
memory.

And the thing is that even if I remove the nodes from the LoadBalancer that
my production instances use (so, it means that no indexing or searching
request is done at all), the problem is still here. So the problem does not
seem to come from my requests.

So, what can I do to solve this problem?

Thanks,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85f2159a-84af-4ff4-b37e-55527fec09c6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

first, instead of simply changing versions of software or jvms, you should
find out, what exactly is occupying so much memory. If it is an
elasticsearch data structure like fielddata or the parent/child id cache,
it is logic, that you do not free this memory areas, as those data
structures are still in use by elasticsearch.

Please use the nodes stats API to find out, what exactly is taking so much
memory. There are several big memory hogs in elasticsearch like fielddata,
parent/child id cache, possibly the completion suggester or your filter
cache if configured appropriately. Take the time to monitor those via the
APIs and the, which one is taking up all the memory.

One of the most cases is that you are trying to facet on a field, which is
being analyzed. Also, you could try to only index data and not search
anything and see if memory usage is ok, then you at least know, which part
of elasticsearch is responsible for this memory consumption (though you
should see it as well by using the monitoring APIs).

If you know what exactly is failing, the next step is to find out, what to
do about it.

--Alex

On Fri, Jan 24, 2014 at 11:17 AM, Talal mazroui.talal@gmail.com wrote:

Hi everybody,

I have a problem with the latest version of ElasticSearch (0.90.10).
It seems that whatever I do, after a little while (some minutes) the
garbage collector does not free any memory.

Here are some of the logs:
[2014-01-24 09:52:05,809][WARN ][monitor.jvm ] [Walter White]
[gc][old][806][1] duration [7.2s], collections [1]/[7.4s], total
[7.2s]/[7.2s], memory [9.7gb]->[8.5gb]/[10gb], all_pools {[young]
[440mb]->[0b]/[0b]}{[survivor] [64mb]->[0b]/[0b]}{[old]
[9.3gb]->[8.5gb]/[10gb]}
[2014-01-24 09:52:52,274][WARN ][monitor.jvm ] [Walter White]
[gc][old][848][2] duration [4.5s], collections [1]/[5.4s], total
[4.5s]/[11.7s], memory [9.4gb]->[9.3gb]/[10gb], all_pools {[young]
[428mb]->[0b]/[0b]}{[survivor] [64mb]->[0b]/[0b]}{[old]
[8.9gb]->[9.3gb]/[10gb]}
[2014-01-24 09:53:08,454][WARN ][monitor.jvm ] [Walter White]
[gc][old][861][3] duration [4s], collections [1]/[4.1s], total
[4s]/[15.7s], memory [9.6gb]->[9.6gb]/[10gb], all_pools {[young]
[280mb]->[0b]/[0b]}{[survivor] [0b]->[0b]/[0b]}{[old]
[9.3gb]->[9.6gb]/[10gb]}
[2014-01-24 09:53:12,753][WARN ][monitor.jvm ] [Walter White]
[gc][old][862][4] duration [3.9s], collections [1]/[4.2s], total
[3.9s]/[19.6s], memory [9.6gb]->[9.6gb]/[10gb], all_pools {[young]
[0b]->[0b]/[0b]}{[survivor] [0b]->[0b]/[0b]}{[old] [9.6gb]->[9.6gb]/[10gb]}

We clearly see that the memory goes from 9.6g to ... 9.6g, so the GC did
not release any memory at all.

I have tried to modify all the settings of the garbage collector that I
know of, but nothing seems to work:

  • I tried to use the G1 GC
  • I tried to use ParNewGC and ConcMarkSweepGC with the option
    CMSInitiatingOccupancyOnly
  • I changed the CMSInitiatingOccupancyFraction from 75 to 50 to 60

My configuration is:

  • I have three indexes, and the main one is 200GB large
  • The indexes have 5 shards and 1 replica
  • The 2 nodes are 2 m1.xlarge EC2 instances (4 CPU, 15GB Memory, 420GB
    hard drive each)
  • I use the shared S3 gateway
  • The Java version I use is 1.7.0_51 (I was on 1.7.0_05 before, and I
    updated it to see if the problem could come from that)
  • The nodes are on Ubuntu 12.04
  • The heap size given to ES is now 10G (I tried with 11G and 8GB, but same
    problem).
    I also tried to launch the cluster on a huge single node (hi1.4xlarge : 16
    CPU, 50GB Memory - with 32GB heap size for ES, 1024GB SSD drive), but same
    problem.

The ES options I changed are:

  • indices.fielddata.cache.size: 1G (that was for testing, but with or
    without the option, the problem still occurs)
  • bootstrap.mlockall: true

I also changed the threadpool options (because I had some messages telling
me that the queue was full) to:
threadpool:
search:
type: fixed
size: 10
queue_size: -1
But like for the cache size of the field data, with or without this
option, I still have the GC error.

And I also changed the default ports.
But all those things did not change anything, my GC still does not free
memory.

And the thing is that even if I remove the nodes from the LoadBalancer
that my production instances use (so, it means that no indexing or
searching request is done at all), the problem is still here. So the
problem does not seem to come from my requests.

So, what can I do to solve this problem?

Thanks,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85f2159a-84af-4ff4-b37e-55527fec09c6%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8Vs%2ByVqDrmdNoyVjCVPc8aK%3DuU4w%3DLh9%2B5y9kofNoNQg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3