Updating array with script causing OOM

Hello,
I am trying to add 120k values in an array through script. I am using below query to do the operation.

curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @newfile.json

sample request is like below

{"update" : { "_id" : "id_5", "_type" : "register", "_index" : "promotion"} }
{ "script":{"lang":"painless","inline": "ctx._source.CUST_LIST.add(params.cust)","params" : {"cust" : "183569109"}},"upsert" : {"CUST_LIST" : ["183569109"]}}
{"update" : { "_id" : "id_5", "_type" : "register", "_index" : "promotion"} }
{ "script":{"lang":"painless","inline": "ctx._source.CUST_LIST.add(params.iab)","params" : {"cust" : "256570283"}},"upsert" : {"CUST_LIST" : ["256570283"]}}

After running for 20 minutes and updating 22k ( i.e. creating 22k version of a document). I am running out of heap.

[2018-01-23T19:20:42,021][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [] fatal error in thread [elasticsearch[Vxi1yAe][bulk][T#1]], exiting
java.lang.OutOfMemoryError: Java heap space
	at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:210) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:230) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:46) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:250) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:271) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:149) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:796) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:447) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:403) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1571) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.elasticsearch.index.engine.InternalEngine.update(InternalEngine.java:740) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:603) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:505) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:556) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:545) ~[elasticsearch-5.5.1.jar:5.5.1]

Heap dump shows 385342 instance of object ( roughly 4gb)

|Class|Instance Count|Total Size|
|---|---|---|
|class [B|385342|4072923618|
|class [C|867226|41676034|
|class org.elasticsearch.action.index.IndexRequest|138460|24507420|
|class org.elasticsearch.action.update.UpdateRequest|112640|17909760|

Our stack -
ElasticSearch 5.5.1
Java 8
Linux
Xmx=4G

Where could I be wrong with this ? Any suggestions are welcome

Did you split the bulk in small parts like updating 100 per 100 docs?

I tried splitting my file into 7k docs each but in that run also I got OOM after 40k update.
Is there any way to set bulk size through CURL or elastic property ?
Thanks

I tried splitting my file into 7k docs each

Start lower. Try with 100 docs...

Is there any way to set bulk size through CURL or elastic property ?

No. Some clients support that like the BulkProcessor class we provide in Java Rest Client but it's basically your responsability to split your entries into small bulk requests.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.