When I enable G1GC, Elasticsearch won't stay running for more than a few
minutes. When it dies, nothing is output in the logs. All I see is an
entry in syslog:
init: elasticsearch main process (17253) terminated with status 134
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
I'm running on EC2, using a cluster of 4 m1.xlarge with 5 gb allocated to
Elasticsearch and mlockall true.
The workload is distributed bulk indexing. I have 600 processes each in a
loop, submitting bulk index requests of 200 documents at a time.
As a side note, if I use UseParNewGC and UseConcMarkSweepGC, eventually I
get stop-the-world pauses (of ~5s) every few seconds on each node. Which
renders the cluster useless.
How can I get G1 to work? Or how can I stop CMS from stopping the world
several times per minute?
I’ve had the best results using ParallelGC. We haven’t profiled the G1 yet,
but I have done fairly extensive profiling of CMS (which I expect behaves
similarly to G1). We’ve found that as we increase query volume ParallelGC
tends to gradually degrade in performance, and an increasing percentage of
queries would get caught by our 2s timeout. In contrast, when we run CMS we
find that the cluster behaves pretty well under increasing load, and then
suddenly at some critical level, machines will start experiencing
multi-second stop-the-world garbage collections and the cluster would
totally falls apart. Our CMS cluster would fall apart at about 60-70% of
the traffic that our ParallelGC cluster could do while still timing out
fewer than 0.5% of requests. Some people disagree with me about this, but I
recommend you give ParallelGC a try. ParallelGC gives us more query
throughput, mostly better query latency (worse at the 99%+ quantile), and
more warning when our cluster is over-stressed and approaching failure.
That said, 600 processes each running bulk inserts could be very
aggressive. I’m not surprised that can knock over your cluster. I normally
backfill batches of 1000 1kb documents in one single-threaded process, and
I have a dynamically tunable sleep period after each insert so that I can
dial back the insert rate if my cluster starts to look like it is over
stressed. Are these live updates you’re processing? How many documents do
you need to index per minute? If this index rate is a hard constraint, then
you may need to scale your write capacity by running more shards on more
machines, or by using machines with better IO throughput (ie, SSDs).
For what it's worth, we helped a client the other day deal with ES 0.20.4
and issues around shard recovery and performance. We switched from <I
don't recall which JVM params and collector> to G1 and "things worked
better", both with updated 17 and 21 of Oracle's Java 1.7. There were
certainly no crashes.
On Monday, June 3, 2013 8:44:17 PM UTC-4, Christopher J. Bottaro wrote:
When I enable G1GC, Elasticsearch won't stay running for more than a few
minutes. When it dies, nothing is output in the logs. All I see is an
entry in syslog:
init: elasticsearch main process (17253) terminated with status 134
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
I'm running on EC2, using a cluster of 4 m1.xlarge with 5 gb allocated to
Elasticsearch and mlockall true.
The workload is distributed bulk indexing. I have 600 processes each in a
loop, submitting bulk index requests of 200 documents at a time.
As a side note, if I use UseParNewGC and UseConcMarkSweepGC, eventually I
get stop-the-world pauses (of ~5s) every few seconds on each node. Which
renders the cluster useless.
How can I get G1 to work? Or how can I stop CMS from stopping the world
several times per minute?
For what it's worth, we helped a client the other day deal with ES 0.20.4
and issues around shard recovery and performance. We switched from <I
don't recall which JVM params and collector> to G1 and "things worked
better", both with updated 17 and 21 of Oracle's Java 1.7. There were
certainly no crashes.
On Monday, June 3, 2013 8:44:17 PM UTC-4, Christopher J. Bottaro wrote:
When I enable G1GC, Elasticsearch won't stay running for more than a few
minutes. When it dies, nothing is output in the logs. All I see is an
entry in syslog:
init: elasticsearch main process (17253) terminated with status 134
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
I'm running on EC2, using a cluster of 4 m1.xlarge with 5 gb allocated to
Elasticsearch and mlockall true.
The workload is distributed bulk indexing. I have 600 processes each in
a loop, submitting bulk index requests of 200 documents at a time.
As a side note, if I use UseParNewGC and UseConcMarkSweepGC, eventually I
get stop-the-world pauses (of ~5s) every few seconds on each node. Which
renders the cluster useless.
How can I get G1 to work? Or how can I stop CMS from stopping the world
several times per minute?
On Wednesday, June 5, 2013 12:00:10 PM UTC-4, Abhijeet Rastogi wrote:
Hi Otis,
Sorry to hop in and hijack the thread. Can you provide the exact JAVA OPTS
that you used for using G1?Thanks
On Wed, Jun 5, 2013 at 4:33 AM, Otis Gospodnetic <otis.gos...@gmail.com<javascript:>
wrote:
Hi,
For what it's worth, we helped a client the other day deal with ES 0.20.4
and issues around shard recovery and performance. We switched from <I
don't recall which JVM params and collector> to G1 and "things worked
better", both with updated 17 and 21 of Oracle's Java 1.7. There were
certainly no crashes.
On Monday, June 3, 2013 8:44:17 PM UTC-4, Christopher J. Bottaro wrote:
When I enable G1GC, Elasticsearch won't stay running for more than a few
minutes. When it dies, nothing is output in the logs. All I see is an
entry in syslog:
init: elasticsearch main process (17253) terminated with status 134
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
I'm running on EC2, using a cluster of 4 m1.xlarge with 5 gb allocated
to Elasticsearch and mlockall true.
The workload is distributed bulk indexing. I have 600 processes each in
a loop, submitting bulk index requests of 200 documents at a time.
As a side note, if I use UseParNewGC and UseConcMarkSweepGC, eventually
I get stop-the-world pauses (of ~5s) every few seconds on each node. Which
renders the cluster useless.
How can I get G1 to work? Or how can I stop CMS from stopping the world
several times per minute?
Thank for the help.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
FWIW, I've also run extensive load tests using G1GC (OpenJDK 7uSomething,
64 bits on Linux) a few months ago, and it worked nicely. As expected from
my earlier tests with application servers, throughput dropped a bit but
latency was a lot more stable, i.e. no long pauses. Didn't use hundreds of
threads for indexing, though, just about 50.
Klaus
On Sunday, 9 June 2013 06:46:16 UTC+2, Otis Gospodnetic wrote:
Hi,
I don't have access to that server at the moment, but I believe it was -XX:+UseG1GC
-server -Xms... -Xmx... -- i.e., nothing exotic.
For what it's worth, we helped a client the other day deal with ES
0.20.4 and issues around shard recovery and performance. We switched from
<I don't recall which JVM params and collector> to G1 and "things worked
better", both with updated 17 and 21 of Oracle's Java 1.7. There were
certainly no crashes.
On Monday, June 3, 2013 8:44:17 PM UTC-4, Christopher J. Bottaro wrote:
When I enable G1GC, Elasticsearch won't stay running for more than a
few minutes. When it dies, nothing is output in the logs. All I see is an
entry in syslog:
init: elasticsearch main process (17253) terminated with status 134
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
I'm running on EC2, using a cluster of 4 m1.xlarge with 5 gb allocated
to Elasticsearch and mlockall true.
The workload is distributed bulk indexing. I have 600 processes each
in a loop, submitting bulk index requests of 200 documents at a time.
As a side note, if I use UseParNewGC and UseConcMarkSweepGC, eventually
I get stop-the-world pauses (of ~5s) every few seconds on each node. Which
renders the cluster useless.
How can I get G1 to work? Or how can I stop CMS from stopping the
world several times per minute?
On Sunday, June 9, 2013 2:46:16 PM UTC+10, Otis Gospodnetic wrote:
I don't have access to that server at the moment, but I believe it was -XX:+UseG1GC
-server -Xms... -Xmx... -- i.e., nothing exotic.
What heap size? We're trying to use G1GC but a few seconds after startup,
the JVM process running the Elasticsearch instance crashes. The only clue
is that it's crashing at
Ive experienced the same behaviour in our embedded ES installations, with
the same results, and it seems that trove is the reason. Doesnt really help
us and our customers though, I would hope that the ES team would address
this, or at least make an official statement that G1GC is not supported.
On Mon, Jul 22, 2013 at 7:44 AM, Dan Everton dan@iocaine.org wrote:
On Sunday, June 9, 2013 2:46:16 PM UTC+10, Otis Gospodnetic wrote:
I don't have access to that server at the moment, but I believe it was -XX:+UseG1GC
-server -Xms... -Xmx... -- i.e., nothing exotic.
What heap size? We're trying to use G1GC but a few seconds after startup,
the JVM process running the Elasticsearch instance crashes. The only clue
is that it's crashing at
Ive experienced the same behaviour in our embedded ES installations, with
the same results, and it seems that trove is the reason. Doesnt really help
us and our customers though, I would hope that the ES team would address
this, or at least make an official statement that G1GC is not supported.
On Mon, Jul 22, 2013 at 7:44 AM, Dan Everton dan@iocaine.org wrote:
On Sunday, June 9, 2013 2:46:16 PM UTC+10, Otis Gospodnetic wrote:
I don't have access to that server at the moment, but I believe it was -XX:+UseG1GC
-server -Xms... -Xmx... -- i.e., nothing exotic.
What heap size? We're trying to use G1GC but a few seconds after startup,
the JVM process running the Elasticsearch instance crashes. The only clue
is that it's crashing at
On Maven repo, I found this gnu/trove/Version.class: compiled Java class
data, version 49.0 (Java 1.5) built with 1.6.0_31-b04-415-11M3646 (Apple
Inc.)
I have copied over the source for a Java 1.6 (class file format 50)
mavenized build of trove here (using Java 1.7.0_21):
Ive experienced the same behaviour in our embedded ES installations, with
the same results, and it seems that trove is the reason. Doesnt really help
us and our customers though, I would hope that the ES team would address
this, or at least make an official statement that G1GC is not supported.
On Mon, Jul 22, 2013 at 7:44 AM, Dan Everton dan@iocaine.org wrote:
On Sunday, June 9, 2013 2:46:16 PM UTC+10, Otis Gospodnetic wrote:
I don't have access to that server at the moment, but I believe it was -XX:+UseG1GC
-server -Xms... -Xmx... -- i.e., nothing exotic.
What heap size? We're trying to use G1GC but a few seconds after
startup, the JVM process running the Elasticsearch instance crashes. The
only clue is that it's crashing at
On Tuesday, July 23, 2013 7:52:49 AM UTC+10, Jörg Prante wrote:
On Maven repo, I found this gnu/trove/Version.class: compiled Java class
data, version 49.0 (Java 1.5) built with 1.6.0_31-b04-415-11M3646 (Apple
Inc.)
I have copied over the source for a Java 1.6 (class file format 50)
mavenized build of trove here (using Java 1.7.0_21):
I wouldn't expect recompiling the JAR with Java 7 would make any difference
but I'll see if we can get it tested.
Switching to mmapfs from niofs helped a little bit with G1GC enabled. The
nodes survive for a few minutes instead of dying immediately. There's no
load on them during this time so I've no idea what the cause might be. But
at this point it looks like G1GC is completely unusable for us.
I could reproduce the issue, both on JDK 7 and 8. Trying to build a
reproducible test case.
Jörg
On Wed, Jul 24, 2013 at 1:22 AM, Dan Everton dan@iocaine.org wrote:
On Tuesday, July 23, 2013 7:52:49 AM UTC+10, Jörg Prante wrote:
On Maven repo, I found this gnu/trove/Version.class: compiled Java class
data, version 49.0 (Java 1.5) built with 1.6.0_31-b04-415-11M3646 (Apple
Inc.)
I have copied over the source for a Java 1.6 (class file format 50)
mavenized build of trove here (using Java 1.7.0_21):
I wouldn't expect recompiling the JAR with Java 7 would make any
difference but I'll see if we can get it tested.
Switching to mmapfs from niofs helped a little bit with G1GC enabled. The
nodes survive for a few minutes instead of dying immediately. There's no
load on them during this time so I've no idea what the cause might be. But
at this point it looks like G1GC is completely unusable for us.
Any update on exact gc params that work with elasticsearch on G1. I am
having the similar issue on ES 0.90.0 with G1
Thanks for your time on this
Cheers
Sai
On Wednesday, July 24, 2013 1:56:41 AM UTC-7, Jörg Prante wrote:
I could reproduce the issue, both on JDK 7 and 8. Trying to build a
reproducible test case.
Jörg
On Wed, Jul 24, 2013 at 1:22 AM, Dan Everton <d...@iocaine.org<javascript:>
wrote:
On Tuesday, July 23, 2013 7:52:49 AM UTC+10, Jörg Prante wrote:
On Maven repo, I found this gnu/trove/Version.class: compiled Java class
data, version 49.0 (Java 1.5) built with 1.6.0_31-b04-415-11M3646 (Apple
Inc.)
I have copied over the source for a Java 1.6 (class file format 50)
mavenized build of trove here (using Java 1.7.0_21):
I wouldn't expect recompiling the JAR with Java 7 would make any
difference but I'll see if we can get it tested.
Switching to mmapfs from niofs helped a little bit with G1GC enabled. The
nodes survive for a few minutes instead of dying immediately. There's no
load on them during this time so I've no idea what the cause might be. But
at this point it looks like G1GC is completely unusable for us.
Cheers,
Dan
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
The primary issue with G1 is the Trove library. Elasticsearch no longer
uses Trove with versions 0.90.6 and higher, so I would suggest upgrading if
you plan on using G1GC.
Any update on exact gc params that work with elasticsearch on G1. I am
having the similar issue on ES 0.90.0 with G1
Thanks for your time on this
Cheers
Sai
On Wednesday, July 24, 2013 1:56:41 AM UTC-7, Jörg Prante wrote:
I could reproduce the issue, both on JDK 7 and 8. Trying to build a
reproducible test case.
Jörg
On Wed, Jul 24, 2013 at 1:22 AM, Dan Everton d...@iocaine.org wrote:
On Tuesday, July 23, 2013 7:52:49 AM UTC+10, Jörg Prante wrote:
On Maven repo, I found this gnu/trove/Version.class: compiled Java
class data, version 49.0 (Java 1.5) built with 1.6.0_31-b04-415-11M3646
(Apple Inc.)
I have copied over the source for a Java 1.6 (class file format 50)
mavenized build of trove here (using Java 1.7.0_21):
I wouldn't expect recompiling the JAR with Java 7 would make any
difference but I'll see if we can get it tested.
Switching to mmapfs from niofs helped a little bit with G1GC enabled.
The nodes survive for a few minutes instead of dying immediately. There's
no load on them during this time so I've no idea what the cause might be.
But at this point it looks like G1GC is completely unusable for us.
Cheers,
Dan
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Thanks Ivan for very quick help here....
Planning to upgrade , this is one more reason to consider
Will keep posted how it goes
Cheers
Sai
On Wednesday, February 5, 2014 11:35:50 AM UTC-8, Ivan Brusic wrote:
The primary issue with G1 is the Trove library. Elasticsearch no longer
uses Trove with versions 0.90.6 and higher, so I would suggest upgrading if
you plan on using G1GC.
--
Ivan
On Wed, Feb 5, 2014 at 11:19 AM, saiprasad mishra <saipras...@gmail.com<javascript:>
wrote:
Any update on exact gc params that work with elasticsearch on G1. I am
having the similar issue on ES 0.90.0 with G1
Thanks for your time on this
Cheers
Sai
On Wednesday, July 24, 2013 1:56:41 AM UTC-7, Jörg Prante wrote:
I could reproduce the issue, both on JDK 7 and 8. Trying to build a
reproducible test case.
Jörg
On Wed, Jul 24, 2013 at 1:22 AM, Dan Everton d...@iocaine.org wrote:
On Tuesday, July 23, 2013 7:52:49 AM UTC+10, Jörg Prante wrote:
On Maven repo, I found this gnu/trove/Version.class: compiled Java
class data, version 49.0 (Java 1.5) built with 1.6.0_31-b04-415-11M3646
(Apple Inc.)
I have copied over the source for a Java 1.6 (class file format 50)
mavenized build of trove here (using Java 1.7.0_21):
I wouldn't expect recompiling the JAR with Java 7 would make any
difference but I'll see if we can get it tested.
Switching to mmapfs from niofs helped a little bit with G1GC enabled.
The nodes survive for a few minutes instead of dying immediately. There's
no load on them during this time so I've no idea what the cause might be.
But at this point it looks like G1GC is completely unusable for us.
Cheers,
Dan
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.