Throttling / Forcing Garbage Collection during Bulk Indexing

Hi,

The use case I have is occasional bursts of millions of index updates,
which uses up alot of JVM Heap space, but after these bursts, the heap
usage goes back to a low level.

I find the quantity of updates can easily cause OOM exceptions, which
crashes ElasticSearch.

I am able to monitor the ElasticSearch Server node JVM Heap allocation from
the client Indexing thread, and when the JVM Heap exceeds a certain amount,
I have the opportunity to take some action which will tell ElasticSearch to
reduce its heap allocation somehow.

I thought that calling flush() on the index which is being written to would
do this, but actually it has no effect whatsoever. Is there anything I can
do to tell ES to reduce the Heap Size, and force Garbage Collection?

I considered reducing this JVM param:
-XX:CMSInitiatingOccupancyFraction=75 - but to be honest - I don't see any
attempt by ES JVM to garbage collect, even when it does reach 75% of the
maximum allocated memory.

  • David.

--

Hi,

You could call System.gc(), although that:
A) could be disabled via Java command line params, so you'd want to
double-check that's not the case
B) is not something the JVM will necessarily go and do, it is just a hint
to the JVM

There are other parameters like -XX:SurvivorRatio=n and -XX:NewRatio=n and
-XX:MaxHeapFreeRatio=n that may help... See

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Friday, January 11, 2013 5:14:54 AM UTC-5, davrob2 wrote:

Hi,

The use case I have is occasional bursts of millions of index updates,
which uses up alot of JVM Heap space, but after these bursts, the heap
usage goes back to a low level.

I find the quantity of updates can easily cause OOM exceptions, which
crashes Elasticsearch.

I am able to monitor the Elasticsearch Server node JVM Heap allocation
from the client Indexing thread, and when the JVM Heap exceeds a certain
amount, I have the opportunity to take some action which will tell
Elasticsearch to reduce its heap allocation somehow.

I thought that calling flush() on the index which is being written to
would do this, but actually it has no effect whatsoever. Is there anything
I can do to tell ES to reduce the Heap Size, and force Garbage Collection?

I considered reducing this JVM param:
-XX:CMSInitiatingOccupancyFraction=75 - but to be honest - I don't see any
attempt by ES JVM to garbage collect, even when it does reach 75% of the
maximum allocated memory.

  • David.

--

Throttling bulk indexing just by trying to change the heap usage behaviour,
or by forcing GC, is falling way too short.

If you use the Java API, look at
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java
where bulk throttling is demonstrated.

By playing with the number of concurrentRequests or bulkSize, you can
define an upper limit on the bulk throughput. As a result, the heap usage
for bulk indexing is limited, too.

Calling flush() has no effect because it controls the data move from the
translog buffer to the index. It has nothing to do with bulk indexing or
with heap usage.

Best regards,

Jörg

--

Hi Jorg,

That's exactly what I was looking for. Thanks.

David.

On Sunday, January 13, 2013 12:04:57 AM UTC, Jörg Prante wrote:

Throttling bulk indexing just by trying to change the heap usage
behaviour, or by forcing GC, is falling way too short.

If you use the Java API, look at
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java
where bulk throttling is demonstrated.

By playing with the number of concurrentRequests or bulkSize, you can
define an upper limit on the bulk throughput. As a result, the heap usage
for bulk indexing is limited, too.

Calling flush() has no effect because it controls the data move from the
translog buffer to the index. It has nothing to do with bulk indexing or
with heap usage.

Best regards,

Jörg

--