Hi all,
I have been testing an upgrade to elasticsearch 1.4 beta1.
We use the Bulk API along with scripts to perform upserts into
elasticsearch. These perform well under ES 1.2 without any tuning.
However, in ES 1.4 beta1, running these upsert scripts often lead to:
java.lang.OutOfMemoryError: Java heap space
We use the bulk API:
curl -iL -silent --show-error -XPOST 'localhost:9200/_bulk' --data-binary
@./<file_name>
where the file contains about 130 Mb ( 10,000 to 250,000 lines ) of data.
It is filled with update / script commands:
{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{"doc":
{"type":"event","date_time":"2014-10-17T19:00:00Z","day":20141017,"impression_cost":0.005,"format":"xyz","impression":1,"referer":"xyz","browser":"xyz","os":"android
4.4.4","device":"nexus
4","channel":"mobile","x_name":"xyz","id":"97bc142e15c7136ebe866890e03dfad9"
},"doc_as_upsert":true
}
{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{
"script":"if( ctx._source.containsKey("impression") ){
ctx._source.impression += 2; } else { ctx._source.impression = 2; };"
}
There were some issues with with permgen taking up memory in this ticket
that have been addressed since the beta1 release, so we re-built from the
1.4 branch:
And I found this discussion about an OOM error that suggested including the
max_merged_segment in elasticsearch.yml.
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/max_merged_segment/elasticsearch/ETjvBVUvCJs/ZccfzUIFAKoJ
index.merge.policy.max_merged_segment: 1gb
Setting max_merged_segment, launching on my development machine with a 2gb:
ES_HEAP_SIZE=2g ./bin/elasticsearch, and bringing down the file size
per-bulk request to about 25Mb stablilzed the system.
However, it would still heap dump when larger files like 130Mb were allowed.
I don't fully understand how this fixed the memory issues. Would anyone be
able to provide some insight into why we would run into memory issues with
the upgrade?
I'd like to better understand how the memory is managed here so that I can
support this in production. Are there recommended sizes for bulk requests?
And how those related to the max_merged_segment size?
Thanks,
Dave
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.