My application write bulk updates the whole time in an Elasticsearch
index (index size: ~200,000 docs, 35 MB, shards: 3*2; segment count ~35).
My cluster has 3 nodes with each 32 GB RAM, ES_HEAP_SIZE=16g,
Elasticsearch V. 1.3.4
I am using index.merge.scheduler.max_thread_count: 1 as I am using a
spinning hard disc.
Unfortunately I often get OutOfMemory errors on every node after merges
and I have to restart Elasticsearch to make any bulk requests again:
[12:17:49,716][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][1] now throttling indexing: numMergesInFlight=4,
maxNumMerges=3
[12:17:49,716][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][0] now throttling indexing: numMergesInFlight=4,
maxNumMerges=3
[12:17:49,719][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][0] stop throttling indexing: numMergesInFlight=2,
maxNumMerges=3
[12:17:49,727][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][1] stop throttling indexing: numMergesInFlight=2,
maxNumMerges=3
.... (+ 100s of log entries like this until this one:
[12:31:25,299][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][1] stop throttling indexing: numMergesInFlight=2,
maxNumMerges=3
[12:32:21,810][DEBUG][action.bulk ] [cluster1]
[v-2014-week41][0], node[02934K_ySZKEaQ3S1Hv9SA], [P], s[STARTED]:
Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@320ade50]
java.lang.OutOfMemoryError: PermGen space
[12:32:24,776][WARN ][action.bulk ] [cluster1] Failed to
send response for bulk/shard
java.lang.OutOfMemoryError: PermGen space
...
My application write bulk updates the whole time in an Elasticsearch index
(index size: ~200,000 docs, 35 MB, shards: 3*2; segment count ~35).
My cluster has 3 nodes with each 32 GB RAM, ES_HEAP_SIZE=16g,
Elasticsearch V. 1.3.4
I am using index.merge.scheduler.max_thread_count: 1 as I am using a
spinning hard disc.
Unfortunately I often get OutOfMemory errors on every node after merges
and I have to restart Elasticsearch to make any bulk requests again:
[12:17:49,716][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][1] now throttling indexing: numMergesInFlight=4,
maxNumMerges=3
[12:17:49,716][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][0] now throttling indexing: numMergesInFlight=4,
maxNumMerges=3
[12:17:49,719][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][0] stop throttling indexing: numMergesInFlight=2,
maxNumMerges=3
[12:17:49,727][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][1] stop throttling indexing: numMergesInFlight=2,
maxNumMerges=3
.... (+ 100s of log entries like this until this one:
[12:31:25,299][INFO ][index.engine.internal ] [cluster1]
[v-2014-week41][1] stop throttling indexing: numMergesInFlight=2,
maxNumMerges=3
[12:32:21,810][DEBUG][action.bulk ] [cluster1]
[v-2014-week41][0], node[02934K_ySZKEaQ3S1Hv9SA], [P], s[STARTED]: Failed
to execute [org.elasticsearch.action.bulk.BulkShardRequest@320ade50]
java.lang.OutOfMemoryError: PermGen space
[12:32:24,776][WARN ][action.bulk ] [cluster1] Failed to send
response for bulk/shard
java.lang.OutOfMemoryError: PermGen space
...
How many index / delete or update requests are you bundeling in a
single bulk api call?
I use a max bulk size of 1 MB (around 2000 docs/bulk); most of the
requests are updates with a small Groovy script which increase some
field values.
I will try now to update from Java 7 to Java 8 on the cluster, as there
is no PermGen space anymore in Java 8.
Are you using the same Bulk object each time or a new instance for each iteration?
I have seen in the past in some Java code people using the same bulk object.
So the bulk started with 2k docs. Next iteration it was having 4k docs, … and so on.
Not sure it's your concern here but it worths checking
How many index / delete or update requests are you bundeling in a
single bulk api call?
I use a max bulk size of 1 MB (around 2000 docs/bulk); most of the
requests are updates with a small Groovy script which increase some
field values.
I will try now to update from Java 7 to Java 8 on the cluster, as there
is no PermGen space anymore in Java 8.
Are you using the same Bulk object each time or a new instance for
each iteration?
I have seen in the past in some Java code people using the same bulk
object.
So the bulk started with 2k docs. Next iteration it was having 4k
docs, … and so on.
Not sure it's your concern here but it worths checking
Thanks, but that doesn't seem the problem (I have double checked it now
just to be sure ), I always create a new instance.
The problem happens sometimes after 3 OutOfMemory-free weeks, but also
sometimes just 1 day after the last Elasticsearch restart.
On 09.10.14 11:39, David Pilato wrote:
Are you using the same Bulk object each time or a new instance for each iteration?
I have seen in the past in some Java code people using the same bulk object.
So the bulk started with 2k docs. Next iteration it was having 4k docs, … and so on.
Not sure it's your concern here but it worths checking
Thanks, but that doesn't seem the problem (I have double checked it now just to be sure ), I always create a new instance.
The problem happens sometimes after 3 OutOfMemory-free weeks, but also sometimes just 1 day after the last Elasticsearch restart.
What gives Nodes Info output? Could you gist it? Cluster1 Node info · GitHub
(Now running with Java 8 instead of Java 7)
Unfortunately it isn't possible to reach the node again after the
OutOfMemory error and get the actual node info.
Marvel doesn't show anything unusual for me before the OutOfMemoryError
(JVM mem<15%...).
Looks good. I was just checking that JVM memory settings have been taken into account.
When you restart your node and monitor nodes stats don't you see something strange?
On 09.10.14 12:38, David Pilato wrote:
What gives Nodes Info output? Could you gist it?
(Now running with Java 8 instead of Java 7)
Unfortunately it isn't possible to reach the node again after the OutOfMemory error and get the actual node info.
Marvel doesn't show anything unusual for me before the OutOfMemoryError (JVM mem<15%...).
Looks good. I was just checking that JVM memory settings have been
taken into account.
When you restart your node and monitor nodes stats don't you see
something strange?
Nop, just as a normal start.
On 09.10.14 12:55, David Pilato wrote:
Looks good. I was just checking that JVM memory settings have been taken into account.
When you restart your node and monitor nodes stats don't you see something strange?
Nop, just as a normal start.
I don't see anything obvious.
Did you compare with other nodes stats?
It sounds like you have enough memory for bulk operations.
May be you should try to reduce the bulk size and see how it goes?
I will try it with Java 8 and a reduced bulk size and will report here
next month if that fixed the problems.
Thanks for your work.
We're experiencing a similar issue after having run ES successfully for
several months without any major changes to our read/write patterns, data
sizes or documents. This is on Java 7 and ES 1.3.4.
Berhnard -- are you using scripting at all? The issue started popping up
after we switched our scripting from MVEL to Groovy.
Is there a chance that Groovy scripts would be blowing up the PermGen (but
not MVEL scripts)?
Adam
On Thursday, October 9, 2014 7:46:36 AM UTC-4, Bernhard Berger wrote:
On 09.10.14 13:12, David Pilato wrote:
I don't see anything obvious.
Did you compare with other nodes stats?
It sounds like you have enough memory for bulk operations.
May be you should try to reduce the bulk size and see how it goes?
I will try it with Java 8 and a reduced bulk size and will report here
next month if that fixed the problems.
Thanks for your work.
We're experiencing a similar issue after having run ES successfully
for several months without any major changes to our read/write
patterns, data sizes or documents. This is on Java 7 and ES 1.3.4.
Berhnard -- are you using scripting at all? The issue started popping
up after we switched our scripting from MVEL to Groovy.
Yes, we also switched from MVEL to Groovy some months ago and the issue
started!
But we also changed a lot of other things in our code base, so I wasn't
sure about the cause.
Our scripts are very simple (and are nearly the same as in MVEL), just
some lines like: ctx._source.texts += text; ctx._source.state = state; ctx._source.number+=1;...
Since yesterday everything runs fine (half of the old bulk size, Java
8), but usually the memory problem appeared after about 1 week running a
node.
There is an issue where Groovy uses more permgen space than MVEL did, we've
opened an issue to fix it here:
Until this is fixed, I believe putting the script on disk will have
Elasticsearch compile it only once and help reduce the permgen usage from
increasing, it can be referenced by name and still have parameters passed
in.
;; Lee
On Friday, October 10, 2014 8:21:44 AM UTC+2, Bernhard Berger wrote:
Am 09.10.2014 22:40, schrieb Adam Cramer:
We're experiencing a similar issue after having run ES successfully
for several months without any major changes to our read/write
patterns, data sizes or documents. This is on Java 7 and ES 1.3.4.
Berhnard -- are you using scripting at all? The issue started popping
up after we switched our scripting from MVEL to Groovy.
Yes, we also switched from MVEL to Groovy some months ago and the issue
started!
But we also changed a lot of other things in our code base, so I wasn't
sure about the cause.
Our scripts are very simple (and are nearly the same as in MVEL), just
some lines like: ctx._source.texts += text; ctx._source.state = state; ctx._source.number+=1;...
Since yesterday everything runs fine (half of the old bulk size, Java
8), but usually the memory problem appeared after about 1 week running a
node.
Until this is fixed, I believe putting the script on disk will have
Elasticsearch compile it only once and help reduce the permgen usage
from increasing, it can be referenced by name and still have
parameters passed in.
Unfortunately not a good alternative for us, as we have to create the
scripts dynamically.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.