we are using ES as a backend of an online service and occasionally, we are
hit by a big garbage collection, which stops the node completely and causes
all sorts of problems. The nodes have plenty of memory I think. During the
GC it looks like this.
This might happen once a day, usually during a period of heavy indexing,
sometimes it doesn't. We tried decresing the heap size, but it does not
have that much of an effect. It makes the GC take a bit less time, but
makes it happen a bit more often.
The data is actually fairly small in size, about 30G in total, but very
complex documents and queries. This is a 5-node cluster, the nodes have 32G
RAM with 22G assigned to ES heap.
I know the manual says we should not touch the JVM GC settings but I feel
we might have to. Does anyone have any idea how to prevent these garbage
collections from ever happening?
You could use G1 GC for nicer behavior regarding application stop times,
but before tinkering with GC, it would be better to check if you have set
up caching, and if it is possible to clear caches or reconfigure ES.
we are using ES as a backend of an online service and occasionally, we are
hit by a big garbage collection, which stops the node completely and causes
all sorts of problems. The nodes have plenty of memory I think. During the
GC it looks like this.
This might happen once a day, usually during a period of heavy indexing,
sometimes it doesn't. We tried decresing the heap size, but it does not
have that much of an effect. It makes the GC take a bit less time, but
makes it happen a bit more often.
The data is actually fairly small in size, about 30G in total, but very
complex documents and queries. This is a 5-node cluster, the nodes have 32G
RAM with 22G assigned to ES heap.
I know the manual says we should not touch the JVM GC settings but I feel
we might have to. Does anyone have any idea how to prevent these garbage
collections from ever happening?
What do you mean if we have setup caching? We do not have any special
caching configuration, we use the defaults. How do you suggest we
reconfigure ES? That is what I am trying to find out.
All best,
Michal
Dne pondělí, 29. prosince 2014 12:06:43 UTC+1 Jörg Prante napsal(a):
You could use G1 GC for nicer behavior regarding application stop times,
but before tinkering with GC, it would be better to check if you have set
up caching, and if it is possible to clear caches or reconfigure ES.
Jörg
On Mon, Dec 29, 2014 at 10:36 AM, Michal Taborsky <michal....@gmail.com
<javascript:>> wrote:
Hello everyone,
we are using ES as a backend of an online service and occasionally, we
are hit by a big garbage collection, which stops the node completely and
causes all sorts of problems. The nodes have plenty of memory I think.
During the GC it looks like this.
This might happen once a day, usually during a period of heavy indexing,
sometimes it doesn't. We tried decresing the heap size, but it does not
have that much of an effect. It makes the GC take a bit less time, but
makes it happen a bit more often.
The data is actually fairly small in size, about 30G in total, but very
complex documents and queries. This is a 5-node cluster, the nodes have 32G
RAM with 22G assigned to ES heap.
I know the manual says we should not touch the JVM GC settings but I feel
we might have to. Does anyone have any idea how to prevent these garbage
collections from ever happening?
You said, very complex documents and queries, and 22 GB heap. Without
knowing more about your queries and filters, it is hard to comment. There
is default query/filter caching in some cases.
What do you mean if we have setup caching? We do not have any special
caching configuration, we use the defaults. How do you suggest we
reconfigure ES? That is what I am trying to find out.
All best,
Michal
Dne pondělí, 29. prosince 2014 12:06:43 UTC+1 Jörg Prante napsal(a):
You could use G1 GC for nicer behavior regarding application stop times,
but before tinkering with GC, it would be better to check if you have set
up caching, and if it is possible to clear caches or reconfigure ES.
we are using ES as a backend of an online service and occasionally, we
are hit by a big garbage collection, which stops the node completely and
causes all sorts of problems. The nodes have plenty of memory I think.
During the GC it looks like this.
This might happen once a day, usually during a period of heavy indexing,
sometimes it doesn't. We tried decresing the heap size, but it does not
have that much of an effect. It makes the GC take a bit less time, but
makes it happen a bit more often.
The data is actually fairly small in size, about 30G in total, but very
complex documents and queries. This is a 5-node cluster, the nodes have 32G
RAM with 22G assigned to ES heap.
I know the manual says we should not touch the JVM GC settings but I
feel we might have to. Does anyone have any idea how to prevent these
garbage collections from ever happening?
+1 for using G1GC. In addition I would suggest not trying to fine tune GC
settings. If you have stop the world old GCs taking 20+ seconds you have a
more fundamental issue at play. I speak from experience on that. We had
similar issues and no amount of JVM/GC tuning could mask the fact we simply
didn't have enough memory.
If you aren't already doing so look at the amount of heap used by the
filter and field caches. Are you capping them? If you aren't expensive
queries could saturate your entire heap. Along the same line keep tabs on
your evictions. ES provides granular metrics so you can look at both filter
and field cache evictions.
You said, very complex documents and queries, and 22 GB heap. Without
knowing more about your queries and filters, it is hard to comment. There
is default query/filter caching in some cases.
What do you mean if we have setup caching? We do not have any special
caching configuration, we use the defaults. How do you suggest we
reconfigure ES? That is what I am trying to find out.
All best,
Michal
Dne pondělí, 29. prosince 2014 12:06:43 UTC+1 Jörg Prante napsal(a):
You could use G1 GC for nicer behavior regarding application stop times,
but before tinkering with GC, it would be better to check if you have set
up caching, and if it is possible to clear caches or reconfigure ES.
we are using ES as a backend of an online service and occasionally, we
are hit by a big garbage collection, which stops the node completely and
causes all sorts of problems. The nodes have plenty of memory I think.
During the GC it looks like this.
This might happen once a day, usually during a period of heavy
indexing, sometimes it doesn't. We tried decresing the heap size, but it
does not have that much of an effect. It makes the GC take a bit less time,
but makes it happen a bit more often.
The data is actually fairly small in size, about 30G in total, but very
complex documents and queries. This is a 5-node cluster, the nodes have 32G
RAM with 22G assigned to ES heap.
I know the manual says we should not touch the JVM GC settings but I
feel we might have to. Does anyone have any idea how to prevent these
garbage collections from ever happening?
Field and filter caches are not the problem, I think, they occupy only
minority of the memory. The garbage collection in fact frees up a lot of
memory, so I think the problem is that the standard GC that is supposed to
run continuously cannot keep up. I will give G1 a try, though I have seen
in several places that it's not recommended as it's not stable enough.
Michal
Dne úterý, 30. prosince 2014 1:55:57 UTC+1 Chris Rimondi napsal(a):
+1 for using G1GC. In addition I would suggest not trying to fine tune GC
settings. If you have stop the world old GCs taking 20+ seconds you have a
more fundamental issue at play. I speak from experience on that. We had
similar issues and no amount of JVM/GC tuning could mask the fact we simply
didn't have enough memory.
If you aren't already doing so look at the amount of heap used by the
filter and field caches. Are you capping them? If you aren't expensive
queries could saturate your entire heap. Along the same line keep tabs on
your evictions. ES provides granular metrics so you can look at both filter
and field cache evictions.
You said, very complex documents and queries, and 22 GB heap. Without
knowing more about your queries and filters, it is hard to comment. There
is default query/filter caching in some cases.
Jörg
On Mon, Dec 29, 2014 at 1:55 PM, Michal Taborsky <michal....@gmail.com
<javascript:>> wrote:
Hi Jörg, thanks for your reply.
What do you mean if we have setup caching? We do not have any special
caching configuration, we use the defaults. How do you suggest we
reconfigure ES? That is what I am trying to find out.
All best,
Michal
Dne pondělí, 29. prosince 2014 12:06:43 UTC+1 Jörg Prante napsal(a):
You could use G1 GC for nicer behavior regarding application stop
times, but before tinkering with GC, it would be better to check if you
have set up caching, and if it is possible to clear caches or reconfigure
ES.
we are using ES as a backend of an online service and occasionally, we
are hit by a big garbage collection, which stops the node completely and
causes all sorts of problems. The nodes have plenty of memory I think.
During the GC it looks like this.
This might happen once a day, usually during a period of heavy
indexing, sometimes it doesn't. We tried decresing the heap size, but it
does not have that much of an effect. It makes the GC take a bit less time,
but makes it happen a bit more often.
The data is actually fairly small in size, about 30G in total, but
very complex documents and queries. This is a 5-node cluster, the nodes
have 32G RAM with 22G assigned to ES heap.
I know the manual says we should not touch the JVM GC settings but I
feel we might have to. Does anyone have any idea how to prevent these
garbage collections from ever happening?
I'm interested in knowing more about G1 GC stability in Java 8, so I can
apply fixes to my production cluster, that is running stable for months
with G1 GC.
All I know are sporadic failures of Lucene 5 codec (which is under
development und not relaesed in ES) and a rare failure of a random junit
test on http://jenkins.elasticsearch.org (maybe a double free pointer) but
they seem not to be escalated into the OpenJDK issue tracker, so I can not
verify the cause, if it's G1 GC or not.
Field and filter caches are not the problem, I think, they occupy only
minority of the memory. The garbage collection in fact frees up a lot of
memory, so I think the problem is that the standard GC that is supposed to
run continuously cannot keep up. I will give G1 a try, though I have seen
in several places that it's not recommended as it's not stable enough.
Michal
Dne úterý, 30. prosince 2014 1:55:57 UTC+1 Chris Rimondi napsal(a):
+1 for using G1GC. In addition I would suggest not trying to fine tune GC
settings. If you have stop the world old GCs taking 20+ seconds you have a
more fundamental issue at play. I speak from experience on that. We had
similar issues and no amount of JVM/GC tuning could mask the fact we simply
didn't have enough memory.
If you aren't already doing so look at the amount of heap used by the
filter and field caches. Are you capping them? If you aren't expensive
queries could saturate your entire heap. Along the same line keep tabs on
your evictions. ES provides granular metrics so you can look at both filter
and field cache evictions.
You said, very complex documents and queries, and 22 GB heap. Without
knowing more about your queries and filters, it is hard to comment. There
is default query/filter caching in some cases.
What do you mean if we have setup caching? We do not have any special
caching configuration, we use the defaults. How do you suggest we
reconfigure ES? That is what I am trying to find out.
All best,
Michal
Dne pondělí, 29. prosince 2014 12:06:43 UTC+1 Jörg Prante napsal(a):
You could use G1 GC for nicer behavior regarding application stop
times, but before tinkering with GC, it would be better to check if you
have set up caching, and if it is possible to clear caches or reconfigure
ES.
Jörg
On Mon, Dec 29, 2014 at 10:36 AM, Michal Taborsky < michal....@gmail.com> wrote:
Hello everyone,
we are using ES as a backend of an online service and occasionally,
we are hit by a big garbage collection, which stops the node completely and
causes all sorts of problems. The nodes have plenty of memory I think.
During the GC it looks like this.
This might happen once a day, usually during a period of heavy
indexing, sometimes it doesn't. We tried decresing the heap size, but it
does not have that much of an effect. It makes the GC take a bit less time,
but makes it happen a bit more often.
The data is actually fairly small in size, about 30G in total, but
very complex documents and queries. This is a 5-node cluster, the nodes
have 32G RAM with 22G assigned to ES heap.
I know the manual says we should not touch the JVM GC settings but I
feel we might have to. Does anyone have any idea how to prevent these
garbage collections from ever happening?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.