Bulk operation: OutOfMemoryError


(Daniel Guo) #1

I use the BulkRequestBuilder to index about 300,000 documents at once.
The ES_HEAP_SIZE is set to 2g. I get the following error: OutOfMemoryError:
GC overhead limit exceed

https://lh4.googleusercontent.com/-gPuV90FoSbY/UqkUssvgllI/AAAAAAAAAVU/XxLEZ_VUP_0/s1600/outOfMemeory.jpg

Except for dividing the bulk request into several smaller bulk requests,
any other suggestions or optimizations?
Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf34f259-f21e-472d-a5ab-7122da5182a4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Daniel Guo) #2

And what is the upper limit of the bulk interface?

On Thursday, December 12, 2013 9:51:44 AM UTC+8, Daniel Guo wrote:

I use the BulkRequestBuilder to index about 300,000 documents at once.
The ES_HEAP_SIZE is set to 2g. I get the following error:
OutOfMemoryError: GC overhead limit exceed

https://lh4.googleusercontent.com/-gPuV90FoSbY/UqkUssvgllI/AAAAAAAAAVU/XxLEZ_VUP_0/s1600/outOfMemeory.jpg

Except for dividing the bulk request into several smaller bulk requests,
any other suggestions or optimizations?
Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91730305-7018-4445-abe6-42efdfc801b6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Otis Gospodnetić) #3

Hi Daniel,

300K is quite a bit, unless docs are tinny. Try smaller bulks. Try higher
-Xmx - 2GB is not a lot. Try SPM or any other tool that tells you more
about your memory usage, both by ES and the JVM itself (e.g. JVM memory
pool sizes and % utilization to see which pool is too small).

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wednesday, December 11, 2013 8:51:44 PM UTC-5, Daniel Guo wrote:

I use the BulkRequestBuilder to index about 300,000 documents at once.
The ES_HEAP_SIZE is set to 2g. I get the following error:
OutOfMemoryError: GC overhead limit exceed

https://lh4.googleusercontent.com/-gPuV90FoSbY/UqkUssvgllI/AAAAAAAAAVU/XxLEZ_VUP_0/s1600/outOfMemeory.jpg

Except for dividing the bulk request into several smaller bulk requests,
any other suggestions or optimizations?
Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9af99235-90aa-4635-9e64-b45479df202c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Daniel Guo) #4

Hey Otis:
Thanks for you advice, you are right. I'm trying on it. Thank you!

On Thursday, December 12, 2013 12:58:16 PM UTC+8, Otis Gospodnetic wrote:

Hi Daniel,

300K is quite a bit, unless docs are tinny. Try smaller bulks. Try
higher -Xmx - 2GB is not a lot. Try SPM or any other tool that tells you
more about your memory usage, both by ES and the JVM itself (e.g. JVM
memory pool sizes and % utilization to see which pool is too small).

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wednesday, December 11, 2013 8:51:44 PM UTC-5, Daniel Guo wrote:

I use the BulkRequestBuilder to index about 300,000 documents at once.
The ES_HEAP_SIZE is set to 2g. I get the following error:
OutOfMemoryError: GC overhead limit exceed

https://lh4.googleusercontent.com/-gPuV90FoSbY/UqkUssvgllI/AAAAAAAAAVU/XxLEZ_VUP_0/s1600/outOfMemeory.jpg

Except for dividing the bulk request into several smaller bulk requests,
any other suggestions or optimizations?
Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/611e68ef-136e-4955-855f-2dd5e6834036%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #5

It depends on the doc sizes and on the number of nodes that are processing
the bulk requests.

Consider reasonable bulk response times. ES should answer around 1-10
seconds, for that, it should not receive bulk requests > 10mb per node.
Just moving 10mb up to a node will take around a second.

So it is important to streamline bulk indexing, to get reasonable bulk
sizes, and to get reasonable response times.

There are also memory options per node to modify the default bulk indexing
resources per node and per shard.

While you can ramp up many dozens or even hundreds of GB on the heap just
for bulk indexing, it is not efficient to allocate such vast amounts of
received raw bulk data in ES. This will either run into thread pool / queue
limits, or time out if it is transported to replica or to other shards, or
you will put much too high pressure on GC as you have observed.

Jörg

On Thu, Dec 12, 2013 at 2:51 AM, Daniel Guo daniel5hbs@gmail.com wrote:

I use the BulkRequestBuilder to index about 300,000 documents at once.
The ES_HEAP_SIZE is set to 2g. I get the following error:
OutOfMemoryError: GC overhead limit exceed

https://lh4.googleusercontent.com/-gPuV90FoSbY/UqkUssvgllI/AAAAAAAAAVU/XxLEZ_VUP_0/s1600/outOfMemeory.jpg

Except for dividing the bulk request into several smaller bulk requests,
any other suggestions or optimizations?
Thanks a lot.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/bf34f259-f21e-472d-a5ab-7122da5182a4%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFuR6JjG-0r1Kcwmm6G%2Bqn6oGzsz0kU-%2B8kFsKx6VYcog%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6