GC issues

Hi ,

I created a plugin to read feeds from a file (185 GB , 19L feeds) and write
it to ES via Internal Client object.
I am using bulk insertion (1000 feeds at a time)
Here after pushing 7.7 L feeds , the plugin stopped and only thing i could
find in the log was this -

[2013-04-02 12:21:48,388][INFO ][monitor.jvm ] [Lurking
Unknown] [gc][ParNew][3050][11523] duration [1.6s], collections [2]/[2.1s],
total [1.6s]/[10.2m], memory [4.3gb]->[4.3gb]/[19.9gb], all_pools {[Code
Cache] [3.4mb]->[3.4mb]/[48mb]}{[Par Eden Space]
[13.2mb]->[865.9kb]/[66.5mb]}{[Par Survivor Space]
[5.5mb]->[6.3mb]/[8.3mb]}{[CMS Old Gen] [4.3gb]->[4.3gb]/[19.9gb]}{[CMS
Perm Gen] [36.1mb]->[36.1mb]/[166mb]}
[2013-04-02 14:57:03,983][WARN ][monitor.jvm ] [Lurking
Unknown] [gc][ParNew][12275][43715] duration [16.9s], collections
[1]/[17s], total [16.9s]/[44.5m], memory [1.1gb]->[1gb]/[19.9gb], all_pools
{[Code Cache] [3.9mb]->[3.9mb]/[48mb]}{[Par Eden Space]
[51.7mb]->[481.4kb]/[66.5mb]}{[Par Survivor Space]
[7.4mb]->[7.6mb]/[8.3mb]}{[CMS Old Gen] [1gb]->[1gb]/[19.9gb]}{[CMS Perm
Gen] [36.2mb]->[36.2mb]/[166mb]}

When i checked the top command , Java had only consumed 15 GB out of the 20
GB RAM i gave it.
I don't quite what the issue is !!!
It would be great if there is someway to detect such issues in the code
itself and do whatever is needed.
Kindly let me know what can be dont about it.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Do you recreate the bulk at each iteration or reuse the same BulkRequest?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 avr. 2013 à 05:14, Vineeth Mohan vineethmohan@algotree.com a écrit :

Hi ,

I created a plugin to read feeds from a file (185 GB , 19L feeds) and write it to ES via Internal Client object.
I am using bulk insertion (1000 feeds at a time)
Here after pushing 7.7 L feeds , the plugin stopped and only thing i could find in the log was this -

[2013-04-02 12:21:48,388][INFO ][monitor.jvm ] [Lurking Unknown] [gc][ParNew][3050][11523] duration [1.6s], collections [2]/[2.1s], total [1.6s]/[10.2m], memory [4.3gb]->[4.3gb]/[19.9gb], all_pools {[Code Cache] [3.4mb]->[3.4mb]/[48mb]}{[Par Eden Space] [13.2mb]->[865.9kb]/[66.5mb]}{[Par Survivor Space] [5.5mb]->[6.3mb]/[8.3mb]}{[CMS Old Gen] [4.3gb]->[4.3gb]/[19.9gb]}{[CMS Perm Gen] [36.1mb]->[36.1mb]/[166mb]}
[2013-04-02 14:57:03,983][WARN ][monitor.jvm ] [Lurking Unknown] [gc][ParNew][12275][43715] duration [16.9s], collections [1]/[17s], total [16.9s]/[44.5m], memory [1.1gb]->[1gb]/[19.9gb], all_pools {[Code Cache] [3.9mb]->[3.9mb]/[48mb]}{[Par Eden Space] [51.7mb]->[481.4kb]/[66.5mb]}{[Par Survivor Space] [7.4mb]->[7.6mb]/[8.3mb]}{[CMS Old Gen] [1gb]->[1gb]/[19.9gb]}{[CMS Perm Gen] [36.2mb]->[36.2mb]/[166mb]}

When i checked the top command , Java had only consumed 15 GB out of the 20 GB RAM i gave it.
I don't quite what the issue is !!!
It would be great if there is someway to detect such issues in the code itself and do whatever is needed.
Kindly let me know what can be dont about it.

Thanks
Vineeth

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you allow 20 GB for the heap, a garbage collection of 16.9 s is
expected with CMS GC.

You say "1000 feeds at a time". I assume you mean 1000 documents per
bulk request?

You say the plugin "stopped". This can be unrelated to the GC. Can you
monitor your app with jvisualvm?

But, if only GC is your concern and you want better instant response
from your app, switch to latest Java 7, ES 0.90+, and G1 GC. G1 GC
interrupts are minimal, but they come at a price, the overall CPU usage
is higher. If G1 is not an option, consider a smaller heap to calm down
CMS GC, and increase heap only if really necessary.

Jörg

Am 03.04.13 05:14, schrieb Vineeth Mohan:

Hi ,

I created a plugin to read feeds from a file (185 GB , 19L feeds) and
write it to ES via Internal Client object.
I am using bulk insertion (1000 feeds at a time)
Here after pushing 7.7 L feeds , the plugin stopped and only thing i
could find in the log was this -

[2013-04-02 12:21:48,388][INFO ][monitor.jvm ] [Lurking
Unknown] [gc][ParNew][3050][11523] duration [1.6s], collections
[2]/[2.1s], total [1.6s]/[10.2m], memory [4.3gb]->[4.3gb]/[19.9gb],
all_pools {[Code Cache] [3.4mb]->[3.4mb]/[48mb]}{[Par Eden Space]
[13.2mb]->[865.9kb]/[66.5mb]}{[Par Survivor Space]
[5.5mb]->[6.3mb]/[8.3mb]}{[CMS Old Gen]
[4.3gb]->[4.3gb]/[19.9gb]}{[CMS Perm Gen] [36.1mb]->[36.1mb]/[166mb]}
[2013-04-02 14:57:03,983][WARN ][monitor.jvm ] [Lurking
Unknown] [gc][ParNew][12275][43715] duration [16.9s], collections
[1]/[17s], total [16.9s]/[44.5m], memory [1.1gb]->[1gb]/[19.9gb],
all_pools {[Code Cache] [3.9mb]->[3.9mb]/[48mb]}{[Par Eden Space]
[51.7mb]->[481.4kb]/[66.5mb]}{[Par Survivor Space]
[7.4mb]->[7.6mb]/[8.3mb]}{[CMS Old Gen] [1gb]->[1gb]/[19.9gb]}{[CMS
Perm Gen] [36.2mb]->[36.2mb]/[166mb]}

When i checked the top command , Java had only consumed 15 GB out of
the 20 GB RAM i gave it.
I don't quite what the issue is !!!
It would be great if there is someway to detect such issues in the
code itself and do whatever is needed.
Kindly let me know what can be dont about it.

Thanks
Vineeth

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello David ,
Its kind of impossible to give reproduction steps. I was running the bulk
request on 180 GB data with 2 Million documents.
But its basically bulk request and the process simply stopped at .7 Milion
document with the log as above.

Hello Jorg ,

Thanks for your reply. Let me try it.

THanks
Vineeth

On Wed, Apr 3, 2013 at 1:12 PM, Jörg Prante joergprante@gmail.com wrote:

If you allow 20 GB for the heap, a garbage collection of 16.9 s is
expected with CMS GC.

You say "1000 feeds at a time". I assume you mean 1000 documents per bulk
request?

You say the plugin "stopped". This can be unrelated to the GC. Can you
monitor your app with jvisualvm?

But, if only GC is your concern and you want better instant response from
your app, switch to latest Java 7, ES 0.90+, and G1 GC. G1 GC interrupts
are minimal, but they come at a price, the overall CPU usage is higher. If
G1 is not an option, consider a smaller heap to calm down CMS GC, and
increase heap only if really necessary.

Jörg

Am 03.04.13 05:14, schrieb Vineeth Mohan:

Hi ,

I created a plugin to read feeds from a file (185 GB , 19L feeds) and
write it to ES via Internal Client object.
I am using bulk insertion (1000 feeds at a time)
Here after pushing 7.7 L feeds , the plugin stopped and only thing i
could find in the log was this -

[2013-04-02 12:21:48,388][INFO ][monitor.jvm ] [Lurking
Unknown] [gc][ParNew][3050][11523] duration [1.6s], collections [2]/[2.1s],
total [1.6s]/[10.2m], memory [4.3gb]->[4.3gb]/[19.9gb], all_pools {[Code
Cache] [3.4mb]->[3.4mb]/[48mb]}{[Par Eden Space]
[13.2mb]->[865.9kb]/[66.5mb]}{**[Par Survivor Space]
[5.5mb]->[6.3mb]/[8.3mb]}{[CMS Old Gen] [4.3gb]->[4.3gb]/[19.9gb]}{[CMS
Perm Gen] [36.1mb]->[36.1mb]/[166mb]}
[2013-04-02 14:57:03,983][WARN ][monitor.jvm ] [Lurking
Unknown] [gc][ParNew][12275][43715] duration [16.9s], collections
[1]/[17s], total [16.9s]/[44.5m], memory [1.1gb]->[1gb]/[19.9gb], all_pools
{[Code Cache] [3.9mb]->[3.9mb]/[48mb]}{[Par Eden Space]
[51.7mb]->[481.4kb]/[66.5mb]}{
[Par Survivor Space]
[7.4mb]->[7.6mb]/[8.3mb]}{[CMS Old Gen] [1gb]->[1gb]/[19.9gb]}{[CMS Perm
Gen] [36.2mb]->[36.2mb]/[166mb]}

When i checked the top command , Java had only consumed 15 GB out of the
20 GB RAM i gave it.
I don't quite what the issue is !!!
It would be great if there is someway to detect such issues in the code
itself and do whatever is needed.
Kindly let me know what can be dont about it.

Thanks
Vineeth

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya Jörg

On Wed, 2013-04-03 at 09:42 +0200, Jörg Prante wrote:

If you allow 20 GB for the heap, a garbage collection of 16.9 s is
expected with CMS GC.

Really?

That's not what I see at all. Running with heaps of 24GB, I get:

GC MarkSweep frequency: 159 s
GC MarkSweep duration: 95.08 ms
GC ParNew frequency: 4 s
GC ParNew duration: 124.75 ms

The only times I see GCs lasting that long are:

  • when swapping is enabled
  • when there is memory pressure and nothing to free

Why do you think 16.9 s for a 20GB heap is normal? Am I
misunderstanding something?

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

See David's question. Basically, I bulk-load via the Java API using a
default of 4096 requests (mix of index and delete) at a time. I call the
following snippet once in the beginning of the bulk load, then then again
after each iteration of adding and submitting 4096 items:

BulkRequestBuilder bulkRequest = client.prepareBulk();
bulkRequest.setRefresh(false);

Then when it's all over I call setRefresh(true)

I seem to recall the use of an object that supported a bulk size and
appeared that it might do this work for you. But I based my code on one of
Shay's examples, and it worked perfectly the first time and has never let
me down. And I've never really seen any need to change the default to
anything other than 4096; I can even load 97M documents in between 2 and 3
hours on a quad-core i7 MacBook with its one slow disk working overtime to
both read the input data and handle ES writes of the database. I can't
imagine how fast it will be when I can run that load on a production-level
server with high-performance disk arrays!

On Wednesday, April 3, 2013 7:08:18 AM UTC-4, Vineeth Mohan wrote:

Hello David ,
Its kind of impossible to give reproduction steps. I was running the bulk
request on 180 GB data with 2 Million documents.
But its basically bulk request and the process simply stopped at .7 Milion
document with the log as above.

Hello Jorg ,

Thanks for your reply. Let me try it.

THanks
Vineeth

On Wed, Apr 3, 2013 at 1:12 PM, Jörg Prante <joerg...@gmail.com<javascript:>

wrote:

If you allow 20 GB for the heap, a garbage collection of 16.9 s is
expected with CMS GC.

You say "1000 feeds at a time". I assume you mean 1000 documents per bulk
request?

You say the plugin "stopped". This can be unrelated to the GC. Can you
monitor your app with jvisualvm?

But, if only GC is your concern and you want better instant response from
your app, switch to latest Java 7, ES 0.90+, and G1 GC. G1 GC interrupts
are minimal, but they come at a price, the overall CPU usage is higher. If
G1 is not an option, consider a smaller heap to calm down CMS GC, and
increase heap only if really necessary.

Jörg

Am 03.04.13 05:14, schrieb Vineeth Mohan:

Hi ,

I created a plugin to read feeds from a file (185 GB , 19L feeds) and
write it to ES via Internal Client object.
I am using bulk insertion (1000 feeds at a time)
Here after pushing 7.7 L feeds , the plugin stopped and only thing i
could find in the log was this -

[2013-04-02 12:21:48,388][INFO ][monitor.jvm ] [Lurking
Unknown] [gc][ParNew][3050][11523] duration [1.6s], collections [2]/[2.1s],
total [1.6s]/[10.2m], memory [4.3gb]->[4.3gb]/[19.9gb], all_pools {[Code
Cache] [3.4mb]->[3.4mb]/[48mb]}{[Par Eden Space]
[13.2mb]->[865.9kb]/[66.5mb]}{**[Par Survivor Space]
[5.5mb]->[6.3mb]/[8.3mb]}{[CMS Old Gen] [4.3gb]->[4.3gb]/[19.9gb]}{[CMS
Perm Gen] [36.1mb]->[36.1mb]/[166mb]}
[2013-04-02 14:57:03,983][WARN ][monitor.jvm ] [Lurking
Unknown] [gc][ParNew][12275][43715] duration [16.9s], collections
[1]/[17s], total [16.9s]/[44.5m], memory [1.1gb]->[1gb]/[19.9gb], all_pools
{[Code Cache] [3.9mb]->[3.9mb]/[48mb]}{[Par Eden Space]
[51.7mb]->[481.4kb]/[66.5mb]}{
[Par Survivor Space]
[7.4mb]->[7.6mb]/[8.3mb]}{[CMS Old Gen] [1gb]->[1gb]/[19.9gb]}{[CMS Perm
Gen] [36.2mb]->[36.2mb]/[166mb]}

When i checked the top command , Java had only consumed 15 GB out of the
20 GB RAM i gave it.
I don't quite what the issue is !!!
It would be great if there is someway to detect such issues in the code
itself and do whatever is needed.
Kindly let me know what can be dont about it.

Thanks
Vineeth

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Clinton,

thanks for your numbers - I will check if I can set up a GC exerciser on
my test server soon (it's a good occasion for me because I am testing
Java 7 and Java 8 with G1 GC for production later this year) and come
back with my numbers.

Right now I can bulk index ~10k docs per sec or ~1k term queries per sec
(~8 MB/sec) through a 4g heap on an ES 0.90 RC1 64bit Java 8 G1 GC
single node cluster without stall (2xOpteron 4170 HE, 32 GB RAM, 4x128G
SSD RAID0), I find this very fascinating. Previous tests some months ago
gave me stalls, it was mostly my code that was purposely revealing the
weaknesses of CMS GC (the fragmentation in the old generation). It was
similar to the code given at
http://nerds-central.blogspot.de/2011/11/comparing-java-7-garbage-collectors.html

My intention was only to describe that it's not uncommon for JVM apps
with large heaps to observe CMS GC stalls for more than 10 seconds. That
is not specifically due to the ES codebase, I understand that Shay gave
some more head room to CMS to reduce fragmentation risk and implemented
some tricks with memory chunks to calm down the CMS GC. Yes, the risk of
getting stalled mainly depends on the kind of workload running on the
JVM. For example, bulk indexing with large docs, segment merging, query
load, filter cache, third party code (plugin/river), a mixture of all of
these...

Back in the days of early Java 6, GC stalls were really a hard issue.
There is also an interesting GC presentation of Cloudera (from 2010)
which is definitely targeted to HBase on Java 6, and now with Java 7, a
couple of things have improved. See

Quoting Todd Lipcon "Machines come with 24-48G RAM - But bigger heaps
mean longer GC pauses - Around 10 seconds/GB on my boxes" - well, I'm
sure the formula is not correct, it's not that high, but it shows the
frustration.

Jörg

Am 03.04.13 16:10, schrieb Clinton Gormley:

Really?

That's not what I see at all. Running with heaps of 24GB, I get:

GC MarkSweep frequency: 159 s
GC MarkSweep duration: 95.08 ms
GC ParNew frequency: 4 s
GC ParNew duration: 124.75 ms

The only times I see GCs lasting that long are:

  • when swapping is enabled
  • when there is memory pressure and nothing to free

Why do you think 16.9 s for a 20GB heap is normal? Am I
misunderstanding something?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

As a sunday special, here are my numbers, more details will follow as
blog post next week I hope.

Environment: single node, Dual Opteron 4170 HE, 4x3,5 2TB SATA3, RHEL
6.3 Server 64bit, 32 GB RAM, Java 1.8.0 ea b82, ES 0.90.0 RC1,
bootstrap.mlockall:true
Workload: mixed, ~4000 doc/sec bulk indexing = ~9 MB/sec + ~900 queries/sec

/usr/java/latest/bin/java -Xms16g -Xmx16g -Djava.awt.headless=true
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-Delasticsearch -Des.path.home=/home/joerg/elasticsearch-0.90.0.RC1 -cp
:/home/joerg/elasticsearch-0.90.0.RC1/lib/elasticsearch-0.90.0.RC1.jar:/home/joerg/elasticsearch-0.90.0.RC1/lib/:/home/joerg/elasticsearch-0.90.0.RC1/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

After ~70min I get

[2013-04-07 11:44:10,567][WARN ][monitor.jvm ] [Makkari]
[gc][ConcurrentMarkSweep][4060][16] duration [19.8s], collections
[1]/[20.7s], total [19.8s]/[25.9s], memory [15.2gb]->[3.5gb]/[15.9gb],
all_pools {[Code Cache] [10mb]->[10mb]/[48mb]}{[Par Eden Space]
[113.5mb]->[22.8mb]/[665.6mb]}{[Par Survivor Space]
[67.3mb]->[0b]/[83.1mb]}{[CMS Old Gen] [15gb]->[3.5gb]/[15.1gb]}

This compacting GC event exceeds the default 5s node info timeout, and
the client disconnected which aborted the test:

[11:43:56,977][INFO
][org.elasticsearch.client.transport][elasticsearch[She-Venom][generic][T#2]]
[She-Venom] failed to get node info for
[#transport#-1][inet[/192.168.1.118:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[inet[/192.168.1.118:9300]][cluster/nodes/info] request_id [15420]
timed out after [5001ms]

With G1 GC, same workload passes with success, and is even faster in
throughput, but with higher average load.

/usr/java/latest/bin/java -Xms16g -Xmx16g -Djava.awt.headless=true
-XX:+UseG1GC -Delasticsearch
-Des.path.home=/home/joerg/elasticsearch-0.90.0.RC1 -cp
:/home/joerg/elasticsearch-0.90.0.RC1/lib/elasticsearch-0.90.0.RC1.jar:/home/joerg/elasticsearch-0.90.0.RC1/lib/:/home/joerg/elasticsearch-0.90.0.RC1/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

Jörg

Am 03.04.13 16:10, schrieb Clinton Gormley:

That's not what I see at all. Running with heaps of 24GB, I get:

GC MarkSweep frequency: 159 s
GC MarkSweep duration: 95.08 ms
GC ParNew frequency: 4 s
GC ParNew duration: 124.75 ms

The only times I see GCs lasting that long are:

  • when swapping is enabled
  • when there is memory pressure and nothing to free

Why do you think 16.9 s for a 20GB heap is normal? Am I
misunderstanding something?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya Jörg

thanks for the numbers.

I note that you're running on java 8, which....

But I'd like to run your tests in different environments and compare.
Could you share your tests?

ta

Clint

On Sun, 2013-04-07 at 13:12 +0200, Jörg Prante wrote:

As a sunday special, here are my numbers, more details will follow as
blog post next week I hope.

Environment: single node, Dual Opteron 4170 HE, 4x3,5 2TB SATA3, RHEL
6.3 Server 64bit, 32 GB RAM, Java 1.8.0 ea b82, ES 0.90.0 RC1,
bootstrap.mlockall:true
Workload: mixed, ~4000 doc/sec bulk indexing = ~9 MB/sec + ~900 queries/sec

/usr/java/latest/bin/java -Xms16g -Xmx16g -Djava.awt.headless=true
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-Delasticsearch -Des.path.home=/home/joerg/elasticsearch-0.90.0.RC1 -cp
:/home/joerg/elasticsearch-0.90.0.RC1/lib/elasticsearch-0.90.0.RC1.jar:/home/joerg/elasticsearch-0.90.0.RC1/lib/:/home/joerg/elasticsearch-0.90.0.RC1/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

After ~70min I get

[2013-04-07 11:44:10,567][WARN ][monitor.jvm ] [Makkari]
[gc][ConcurrentMarkSweep][4060][16] duration [19.8s], collections
[1]/[20.7s], total [19.8s]/[25.9s], memory [15.2gb]->[3.5gb]/[15.9gb],
all_pools {[Code Cache] [10mb]->[10mb]/[48mb]}{[Par Eden Space]
[113.5mb]->[22.8mb]/[665.6mb]}{[Par Survivor Space]
[67.3mb]->[0b]/[83.1mb]}{[CMS Old Gen] [15gb]->[3.5gb]/[15.1gb]}

This compacting GC event exceeds the default 5s node info timeout, and
the client disconnected which aborted the test:

[11:43:56,977][INFO
][org.elasticsearch.client.transport][elasticsearch[She-Venom][generic][T#2]]
[She-Venom] failed to get node info for
[#transport#-1][inet[/192.168.1.118:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[inet[/192.168.1.118:9300]][cluster/nodes/info] request_id [15420]
timed out after [5001ms]

With G1 GC, same workload passes with success, and is even faster in
throughput, but with higher average load.

/usr/java/latest/bin/java -Xms16g -Xmx16g -Djava.awt.headless=true
-XX:+UseG1GC -Delasticsearch
-Des.path.home=/home/joerg/elasticsearch-0.90.0.RC1 -cp
:/home/joerg/elasticsearch-0.90.0.RC1/lib/elasticsearch-0.90.0.RC1.jar:/home/joerg/elasticsearch-0.90.0.RC1/lib/:/home/joerg/elasticsearch-0.90.0.RC1/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

Jörg

Am 03.04.13 16:10, schrieb Clinton Gormley:

That's not what I see at all. Running with heaps of 24GB, I get:

GC MarkSweep frequency: 159 s
GC MarkSweep duration: 95.08 ms
GC ParNew frequency: 4 s
GC ParNew duration: 124.75 ms

The only times I see GCs lasting that long are:

  • when swapping is enabled
  • when there is memory pressure and nothing to free

Why do you think 16.9 s for a 20GB heap is normal? Am I
misunderstanding something?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

But I'd like to run your tests in different environments and compare.
Could you share your tests?

In fact, could you open an issue with the code, so that it is easier to
track?

Clint

On Sun, 2013-04-07 at 13:12 +0200, Jörg Prante wrote:

As a sunday special, here are my numbers, more details will follow as
blog post next week I hope.

Environment: single node, Dual Opteron 4170 HE, 4x3,5 2TB SATA3, RHEL
6.3 Server 64bit, 32 GB RAM, Java 1.8.0 ea b82, ES 0.90.0 RC1,
bootstrap.mlockall:true
Workload: mixed, ~4000 doc/sec bulk indexing = ~9 MB/sec + ~900 queries/sec

/usr/java/latest/bin/java -Xms16g -Xmx16g -Djava.awt.headless=true
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-Delasticsearch -Des.path.home=/home/joerg/elasticsearch-0.90.0.RC1 -cp
:/home/joerg/elasticsearch-0.90.0.RC1/lib/elasticsearch-0.90.0.RC1.jar:/home/joerg/elasticsearch-0.90.0.RC1/lib/:/home/joerg/elasticsearch-0.90.0.RC1/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

After ~70min I get

[2013-04-07 11:44:10,567][WARN ][monitor.jvm ] [Makkari]
[gc][ConcurrentMarkSweep][4060][16] duration [19.8s], collections
[1]/[20.7s], total [19.8s]/[25.9s], memory [15.2gb]->[3.5gb]/[15.9gb],
all_pools {[Code Cache] [10mb]->[10mb]/[48mb]}{[Par Eden Space]
[113.5mb]->[22.8mb]/[665.6mb]}{[Par Survivor Space]
[67.3mb]->[0b]/[83.1mb]}{[CMS Old Gen] [15gb]->[3.5gb]/[15.1gb]}

This compacting GC event exceeds the default 5s node info timeout, and
the client disconnected which aborted the test:

[11:43:56,977][INFO
][org.elasticsearch.client.transport][elasticsearch[She-Venom][generic][T#2]]
[She-Venom] failed to get node info for
[#transport#-1][inet[/192.168.1.118:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[inet[/192.168.1.118:9300]][cluster/nodes/info] request_id [15420]
timed out after [5001ms]

With G1 GC, same workload passes with success, and is even faster in
throughput, but with higher average load.

/usr/java/latest/bin/java -Xms16g -Xmx16g -Djava.awt.headless=true
-XX:+UseG1GC -Delasticsearch
-Des.path.home=/home/joerg/elasticsearch-0.90.0.RC1 -cp
:/home/joerg/elasticsearch-0.90.0.RC1/lib/elasticsearch-0.90.0.RC1.jar:/home/joerg/elasticsearch-0.90.0.RC1/lib/:/home/joerg/elasticsearch-0.90.0.RC1/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

Jörg

Am 03.04.13 16:10, schrieb Clinton Gormley:

That's not what I see at all. Running with heaps of 24GB, I get:

GC MarkSweep frequency: 159 s
GC MarkSweep duration: 95.08 ms
GC ParNew frequency: 4 s
GC ParNew duration: 124.75 ms

The only times I see GCs lasting that long are:

  • when swapping is enabled
  • when there is memory pressure and nothing to free

Why do you think 16.9 s for a 20GB heap is normal? Am I
misunderstanding something?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sure I'd like to share the tests. Please be patient - the test uses
unfortunately private data I am not allowed to share but I will move the
test program to use Wikipedia data instead (similar to Wikipedia river).

Of course I will also test Java 6 and Java 7 to have a good comparison
and share my insights on a blog post soon... I was after Java 8 because
of the PermGen changes and G1 stability.

Jörg

Am 08.04.13 21:49, schrieb Clinton Gormley:

Hiya Jörg

thanks for the numbers.

I note that you're running on java 8, which....

But I'd like to run your tests in different environments and compare.
Could you share your tests?

ta

Clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.