Anyone have G1 GC working? What environment/configs?

I'm trying to tune ES for predictable response times under an update-heavy
load. Our big challenge has been long GC pauses (sometimes stopping the app
for up to 30 seconds).

This post
(http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html)
by Jörg Prante has a lot of knowledgable advice on GC tuning and seems to
endorse the G1 GC for the situation I'm working with. It appears to reduces
long pauses at the expense of higher CPU load, which sounds like just the
trade-off I want to make.

But when I try to run the G1 GC, it causes the JVM to segfault, pretty soon
after starting. I've asked around with a few others in #elasticsearch, and
it seems others are having similar issues with G1--many I've spoken to have
tried it but ditched it when it segfaulted too much.

However, everyone seems to "know somebody" or know of someone who has had
success with G1.

So ... is G1 working for anyone? If you have G1 working and no segfault
issues, can you tell me a little more about your environment? (what OS,
JVM, ES version, JVM flags, app settings etc.) It doesn't make sense to me
that it is segfaulting and dying for me and others, but working okay for
some. There has to be a reason, if it is working for some--I'm growing
skeptical that G1GC is working for anyone until I hear otherwise :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I tested the G1GC option a while ago, because I had to reduce the "stop the
world" GC phases on large heaps for my workload. This is a very urgent
issue for me because I want to handle machines with 64g RAM in production.

Back in April, I did not encounter any segfaults, with JDK 1.7.0_11 or
1.8.0-ea-b82 and ES 0.90.0.RC1. The workload on a single node was ~10k docs
per second bulk indexing, mixed with plain term queries at ~800 qps. No
facet queries or filters.

In the meantime I learned that trove4j crashes randomly on all known JVM
releases 1.7.0_17+ and 1.8.0+ with G1GC enabled, and I could reproduce it.
I did not try older JVM versions though. Especially the trove4j Maps seem
affected. ES makes use of them heavily in the faceting module so this
explains immediate segfaults when G1GC is enabled for these kind of
workloads. Right now I'm experimenting with HPPC as a trove4j replacement
in ES but I'm not finished yet. And I still have hope that G1GC can somehow
be fixed for the trove4j issue.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We've had no luck with G1 because of the trove4j issue, tried on several
installations on different platforms.

Instead, this seems to be the best solution we have got so far using big
heaps:
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=30
-XX:+UseCMSInitiatingOccupancyOnly
-XX:NewRatio=4

On Tue, Aug 20, 2013 at 7:02 PM, Nathan Dial nathan.dial@gmail.com wrote:

I'm trying to tune ES for predictable response times under an update-heavy
load. Our big challenge has been long GC pauses (sometimes stopping the app
for up to 30 seconds).

This post (
http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html)
by Jörg Prante has a lot of knowledgable advice on GC tuning and seems to
endorse the G1 GC for the situation I'm working with. It appears to reduces
long pauses at the expense of higher CPU load, which sounds like just the
trade-off I want to make.

But when I try to run the G1 GC, it causes the JVM to segfault, pretty
soon after starting. I've asked around with a few others in elasticsearch,
and it seems others are having similar issues with G1--many I've spoken to
have tried it but ditched it when it segfaulted too much.

However, everyone seems to "know somebody" or know of someone who has had
success with G1.

So ... is G1 working for anyone? If you have G1 working and no segfault
issues, can you tell me a little more about your environment? (what OS,
JVM, ES version, JVM flags, app settings etc.) It doesn't make sense to me
that it is segfaulting and dying for me and others, but working okay for
some. There has to be a reason, if it is working for some--I'm growing
skeptical that G1GC is working for anyone until I hear otherwise :slight_smile:

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
mvh

Runar Myklebust

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

Thanks for sharing, would you mind sharing also:

Heap size
Total memory available
Number of nodes
Index size
New generation heap size
Shard size
Number of queries second/indexing
Number of indexes

How did you get to 30 for UseCMSInitiatingOccupancyOnly?

Thanks

S
On 21 Aug 2013 08:46, "Runar Myklebust" runar.a.m@gmail.com wrote:

We've had no luck with G1 because of the trove4j issue, tried on several
installations on different platforms.

Instead, this seems to be the best solution we have got so far using big
heaps:
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=30
-XX:+UseCMSInitiatingOccupancyOnly
-XX:NewRatio=4

On Tue, Aug 20, 2013 at 7:02 PM, Nathan Dial nathan.dial@gmail.com
wrote:

I'm trying to tune ES for predictable response times under an
update-heavy load. Our big challenge has been long GC pauses (sometimes
stopping the app for up to 30 seconds).

This post (
http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html)
by Jörg Prante has a lot of knowledgable advice on GC tuning and seems to
endorse the G1 GC for the situation I'm working with. It appears to reduces
long pauses at the expense of higher CPU load, which sounds like just the
trade-off I want to make.

But when I try to run the G1 GC, it causes the JVM to segfault, pretty
soon after starting. I've asked around with a few others in elasticsearch,
and it seems others are having similar issues with G1--many I've spoken to
have tried it but ditched it when it segfaulted too much.

However, everyone seems to "know somebody" or know of someone who has
had success with G1.

So ... is G1 working for anyone? If you have G1 working and no segfault
issues, can you tell me a little more about your environment? (what OS,
JVM, ES version, JVM flags, app settings etc.) It doesn't make sense to me
that it is segfaulting and dying for me and others, but working okay for
some. There has to be a reason, if it is working for some--I'm growing
skeptical that G1GC is working for anyone until I hear otherwise :slight_smile:

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
mvh

Runar Myklebust

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
G

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi, since ES is embedded in our product and used by different customers,
its hard to write any specific numbers on these, but generally this is what
we recommend when heap-size is, say, larger than 6GB. One customer has
heapsize of 20GB.
Number of nodes vary from 4 to 14. The index size is not very large in
number of documents, i guess around 266.000 is max I have seen, but there
are often a lot of data pr document (including extracted text from pdf's
etc)

The 30 for UCIOO was proved to work best at one of our customers after a
lot of testing with expected max traffic (200 request pr second, each
spawning from 4-20 queries on index) (in our product the central DB will
always be the the bottleneck of performance, not ES :slight_smile:

On Wed, Aug 21, 2013 at 9:03 AM, Simone Sciarrati s.sciarrati@gmail.comwrote:

Hi,

Thanks for sharing, would you mind sharing also:

Heap size
Total memory available
Number of nodes
Index size
New generation heap size
Shard size
Number of queries second/indexing
Number of indexes

How did you get to 30 for UseCMSInitiatingOccupancyOnly?

Thanks

S

On 21 Aug 2013 08:46, "Runar Myklebust" runar.a.m@gmail.com wrote:

We've had no luck with G1 because of the trove4j issue, tried on several
installations on different platforms.

Instead, this seems to be the best solution we have got so far using big
heaps:
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=30
-XX:+UseCMSInitiatingOccupancyOnly
-XX:NewRatio=4

On Tue, Aug 20, 2013 at 7:02 PM, Nathan Dial nathan.dial@gmail.com
wrote:

I'm trying to tune ES for predictable response times under an
update-heavy load. Our big challenge has been long GC pauses (sometimes
stopping the app for up to 30 seconds).

This post (
http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html)
by Jörg Prante has a lot of knowledgable advice on GC tuning and seems to
endorse the G1 GC for the situation I'm working with. It appears to reduces
long pauses at the expense of higher CPU load, which sounds like just the
trade-off I want to make.

But when I try to run the G1 GC, it causes the JVM to segfault, pretty
soon after starting. I've asked around with a few others in elasticsearch,
and it seems others are having similar issues with G1--many I've spoken to
have tried it but ditched it when it segfaulted too much.

However, everyone seems to "know somebody" or know of someone who has
had success with G1.

So ... is G1 working for anyone? If you have G1 working and no segfault
issues, can you tell me a little more about your environment? (what OS,
JVM, ES version, JVM flags, app settings etc.) It doesn't make sense to me
that it is segfaulting and dying for me and others, but working okay for
some. There has to be a reason, if it is working for some--I'm growing
skeptical that G1GC is working for anyone until I hear otherwise :slight_smile:

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
mvh

Runar Myklebust

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
G

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
mvh

Runar Myklebust

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the info. Does your NewRatio of 4 work for the 6 GB heaps as
well as the 30GB, or do you tend to have further tweaks in there?

What is your strategy for tweaking? do you use metrics to give you a
direction to move things, or is it all just trial and error?

On Wednesday, August 21, 2013 2:30:12 AM UTC-5, Runar Myklebust wrote:

Hi, since ES is embedded in our product and used by different customers,
its hard to write any specific numbers on these, but generally this is what
we recommend when heap-size is, say, larger than 6GB. One customer has
heapsize of 20GB.
Number of nodes vary from 4 to 14. The index size is not very large in
number of documents, i guess around 266.000 is max I have seen, but there
are often a lot of data pr document (including extracted text from pdf's
etc)

The 30 for UCIOO was proved to work best at one of our customers after a
lot of testing with expected max traffic (200 request pr second, each
spawning from 4-20 queries on index) (in our product the central DB will
always be the the bottleneck of performance, not ES :slight_smile:

On Wed, Aug 21, 2013 at 9:03 AM, Simone Sciarrati <s.sci...@gmail.com<javascript:>

wrote:

Hi,

Thanks for sharing, would you mind sharing also:

Heap size
Total memory available
Number of nodes
Index size
New generation heap size
Shard size
Number of queries second/indexing
Number of indexes

How did you get to 30 for UseCMSInitiatingOccupancyOnly?

Thanks

S

On 21 Aug 2013 08:46, "Runar Myklebust" <runa...@gmail.com <javascript:>>
wrote:

We've had no luck with G1 because of the trove4j issue, tried on
several installations on different platforms.

Instead, this seems to be the best solution we have got so far using
big heaps:
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=30
-XX:+UseCMSInitiatingOccupancyOnly
-XX:NewRatio=4

On Tue, Aug 20, 2013 at 7:02 PM, Nathan Dial <natha...@gmail.com<javascript:>>
wrote:

I'm trying to tune ES for predictable response times under an
update-heavy load. Our big challenge has been long GC pauses (sometimes
stopping the app for up to 30 seconds).

This post (
http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html)
by Jörg Prante has a lot of knowledgable advice on GC tuning and seems to
endorse the G1 GC for the situation I'm working with. It appears to reduces
long pauses at the expense of higher CPU load, which sounds like just the
trade-off I want to make.

But when I try to run the G1 GC, it causes the JVM to segfault, pretty
soon after starting. I've asked around with a few others in elasticsearch,
and it seems others are having similar issues with G1--many I've spoken to
have tried it but ditched it when it segfaulted too much.

However, everyone seems to "know somebody" or know of someone who has
had success with G1.

So ... is G1 working for anyone? If you have G1 working and no
segfault issues, can you tell me a little more about your environment?
(what OS, JVM, ES version, JVM flags, app settings etc.) It doesn't make
sense to me that it is segfaulting and dying for me and others, but working
okay for some. There has to be a reason, if it is working for some--I'm
growing skeptical that G1GC is working for anyone until I hear otherwise :slight_smile:

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
mvh

Runar Myklebust

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
G

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
mvh

Runar Myklebust

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.