1.0.0.Beta2 OOM logs

jprante · December 18, 2013, 3:55pm

I ran some heap scaling tests how to find out the required heap for my
workload.

My config was ES 1.0.0.Beta2 cluster, 3 RHEL 6.3 nodes, Java 1.8.0-ea JVM
25.0-b56, 4GB heap, G1 GC (and some tuning for segment merge and bulk)

Workload: mixed, scan/scroll query over 1.6m docs plus term queries over
20m docs (unknown queries per second, but higher than 5000) with bulk
indexing (5000 docs per second)

The 4GB exercise result was OOM on all nodes after an hour run, with all
kinds of error messages. The cluster restarted ok afterwards so it did not
matter at all. Increasing heap to 6GB and redoing the exercise succeeded
after 52 minutes.

I just want to share the OOM logs with anyone who might be interested to
have a look, because they are so pretty

gist.github.com

https://gist.github.com/jprante/8024139

boreas-20131218.log

[2013-12-17 13:06:30,177][INFO ][node                     ] [Collector] version[1.0.0.Beta2], pid[30239], build[296cfbe/2013-12-02T15:46:27Z]
[2013-12-17 13:06:30,177][INFO ][node                     ] [Collector] initializing ...
[2013-12-17 13:06:30,196][INFO ][plugins                  ] [Collector] loaded [analysis-german, ingest], sites []
[2013-12-17 13:06:33,051][INFO ][node                     ] [Collector] initialized
[2013-12-17 13:06:33,052][INFO ][node                     ] [Collector] starting ...
[2013-12-17 13:06:34,055][INFO ][transport                ] [Collector] bound_address {inet[/10.3.2.32:19300]}, publish_address {inet[/10.3.2.32:19300]}
[2013-12-17 13:06:37,172][INFO ][cluster.service          ] [Collector] detected_master [Authority][YNdGcgoQTqigLms31YCKNg][inet[/10.3.2.31:19300]], added {[Authority][YNdGcgoQTqigLms31YCKNg][inet[/10.3.2.31:19300]],}, reason: zen-disco-receive(from master [[Authority][YNdGcgoQTqigLms31YCKNg][inet[/10.3.2.31:19300]]])
[2013-12-17 13:06:37,194][INFO ][discovery                ] [Collector] zbn-1.0/_vXVmJi6RqCmMiS3AWXt6Q
[2013-12-17 13:06:37,240][INFO ][http                     ] [Collector] bound_address {inet[/10.3.2.32:19200]}, publish_address {inet[/10.3.2.32:19200]}
[2013-12-17 13:06:37,241][INFO ][node                     ] [Collector] started

This file has been truncated. show original

notos-20131218.log

[2013-12-17 13:06:35,241][INFO ][node                     ] [The Profile] version[1.0.0.Beta2], pid[24704], build[296cfbe/2013-12-02T15:46:27Z]
[2013-12-17 13:06:35,242][INFO ][node                     ] [The Profile] initializing ...
[2013-12-17 13:06:35,260][INFO ][plugins                  ] [The Profile] loaded [analysis-german, ingest], sites []
[2013-12-17 13:06:38,110][INFO ][node                     ] [The Profile] initialized
[2013-12-17 13:06:38,110][INFO ][node                     ] [The Profile] starting ...
[2013-12-17 13:06:39,109][INFO ][transport                ] [The Profile] bound_address {inet[/10.3.2.33:19300]}, publish_address {inet[/10.3.2.33:19300]}
[2013-12-17 13:06:42,213][INFO ][cluster.service          ] [The Profile] detected_master [Authority][YNdGcgoQTqigLms31YCKNg][inet[/10.3.2.31:19300]], added {[Authority][YNdGcgoQTqigLms31YCKNg][inet[/10.3.2.31:19300]],[Collector][_vXVmJi6RqCmMiS3AWXt6Q][inet[/10.3.2.32:19300]],}, reason: zen-disco-receive(from master [[Authority][YNdGcgoQTqigLms31YCKNg][inet[/10.3.2.31:19300]]])
[2013-12-17 13:06:42,250][INFO ][discovery                ] [The Profile] zbn-1.0/ldtEHWFxRcaZBHQaKFS_TQ
[2013-12-17 13:06:42,281][INFO ][http                     ] [The Profile] bound_address {inet[/10.3.2.33:19200]}, publish_address {inet[/10.3.2.33:19200]}
[2013-12-17 13:06:42,282][INFO ][node                     ] [The Profile] started

This file has been truncated. show original

zephyros-20131218.log

FYI I'm considering a memory watchdog on shard level that might detect low
free heap condition in time and can return warnings to the bulk client, so
the bulk client might throttle, suspend, or exit the indexing cleanly,
before OOMs start to break out in the cluster with all the risk of crashing
shards or node dropouts. Surely not an exact science but with some
heuristics it should work (e.g. below a threshold of 10mb free heap there
should no execution of the indexing engine allowed)

Would love to have more time for testing the exciting new 1.0.0.Beta2
features, but right now I'm just happy to run my data reconciliations
successfully.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEV9HpzbT5HzdG5s0KspvQC%2Ba153-iTkue4QvRrCiTVxw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jason_Wee · December 19, 2013, 7:29am

Hi Jörg,

Java 1.8.0-ea JVM 25.0-b56 is that java 8 or typo?

/Jason

On Wed, Dec 18, 2013 at 11:55 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

I ran some heap scaling tests how to find out the required heap for my
workload.

My config was ES 1.0.0.Beta2 cluster, 3 RHEL 6.3 nodes, Java 1.8.0-ea JVM
25.0-b56, 4GB heap, G1 GC (and some tuning for segment merge and bulk)

Workload: mixed, scan/scroll query over 1.6m docs plus term queries over
20m docs (unknown queries per second, but higher than 5000) with bulk
indexing (5000 docs per second)

The 4GB exercise result was OOM on all nodes after an hour run, with all
kinds of error messages. The cluster restarted ok afterwards so it did not
matter at all. Increasing heap to 6GB and redoing the exercise succeeded
after 52 minutes.

I just want to share the OOM logs with anyone who might be interested to
have a look, because they are so pretty

OOM of ES 1.0.0.Beta2 cluster 3 RHEL 6.3 nodes (zephyros, boreas, notos) Java 1.8.0-ea, JVM 25.0-b56, 4GB heap, G1 GC Workload: mixed term queries with bulk indexing (5000 docs per second) · GitHub

FYI I'm considering a memory watchdog on shard level that might detect low
free heap condition in time and can return warnings to the bulk client, so
the bulk client might throttle, suspend, or exit the indexing cleanly,
before OOMs start to break out in the cluster with all the risk of crashing
shards or node dropouts. Surely not an exact science but with some
heuristics it should work (e.g. below a threshold of 10mb free heap there
should no execution of the indexing engine allowed)

Would love to have more time for testing the exciting new 1.0.0.Beta2
features, but right now I'm just happy to run my data reconciliations
successfully.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEV9HpzbT5HzdG5s0KspvQC%2Ba153-iTkue4QvRrCiTVxw%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itz9DpJiegQvQoDQS2HnNfd7rqi9gc6u4-P0HvezTznVZg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 19, 2013, 9:16am

I'm testing Java 8, yes, for some months now.

[joerg@zephyros ~]$ ls -l jdk-8*
-rw-r--r--. 1 joerg joerg 274442240 8. Dez 22:00
jdk-8-ea-bin-b114-linux-x64-31_oct_2013.tar

[joerg@zephyros ~]$ /usr/java/jdk1.8.0/bin/java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b114)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b56, mixed mode)

/usr/java/jdk1.8.0/bin/java -Xms6g -Xmx6g -XX:+UseG1GC
-XX:MaxGCPauseMillis=1000 -Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=3333
-Djava.rmi.server.hostname=zephyros -Delasticsearch
-Des.path.home=/home/es/elasticsearch-1.0.0.Beta2 -cp
:/home/es/elasticsearch-1.0.0.Beta2/lib/elasticsearch-1.0.0.Beta2.jar:/home/es/elasticsearch-1.0.0.Beta2/lib/:/home/es/elasticsearch-1.0.0.Beta2/lib/sigar/
org.elasticsearch.bootstrap.ElasticSearch

Beware of using MVEL, it is totally broken running on a Java 8 JVM. But I
would either prefer Nashorn.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHDy6HDKL6S-uEt9i7_E7nPmuFftgMAP13miBQD%3DnB3sg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jason_Wee · December 19, 2013, 10:09am

Thank you for sharing, Jörg

/Jason

On Thu, Dec 19, 2013 at 5:16 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

I'm testing Java 8, yes, for some months now.

[joerg@zephyros ~]$ ls -l jdk-8*
-rw-r--r--. 1 joerg joerg 274442240 8. Dez 22:00
jdk-8-ea-bin-b114-linux-x64-31_oct_2013.tar

[joerg@zephyros ~]$ /usr/java/jdk1.8.0/bin/java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b114)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b56, mixed mode)

/usr/java/jdk1.8.0/bin/java -Xms6g -Xmx6g -XX:+UseG1GC
-XX:MaxGCPauseMillis=1000 -Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=3333
-Djava.rmi.server.hostname=zephyros -Delasticsearch
-Des.path.home=/home/es/elasticsearch-1.0.0.Beta2 -cp
:/home/es/elasticsearch-1.0.0.Beta2/lib/elasticsearch-1.0.0.Beta2.jar:/home/es/elasticsearch-1.0.0.Beta2/lib/:/home/es/elasticsearch-1.0.0.Beta2/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

Beware of using MVEL, it is totally broken running on a Java 8 JVM. But I
would either prefer Nashorn.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHDy6HDKL6S-uEt9i7_E7nPmuFftgMAP13miBQD%3DnB3sg%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itwR6yRzSce8iOX_WKP2xjeuBLSLVN%2BxqjPwL%2B4cgouUKg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Garbage collection not kicking in - Heap is growing to 98% Elasticsearch	3	930	June 29, 2017
OOM on Cold Cluster Start Elasticsearch	5	497	July 6, 2017
Aggregate query: Elasticsearch:java.lang.OutOfMemoryError: Java heap space Elasticsearch	8	1447	July 25, 2019
OOM Java heapspace on ES1.1.1 cluster Elasticsearch	2	438	April 13, 2017
ES 2.3.3 - Cannot understand why OOM Elasticsearch	9	1326	April 6, 2017

1.0.0.Beta2 OOM logs

Related topics