Updating array with script causing OOM


(Shivam Arora) #1

Hello,
I am trying to add 120k values in an array through script. I am using below query to do the operation.

curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @newfile.json

sample request is like below

{"update" : { "_id" : "id_5", "_type" : "register", "_index" : "promotion"} }
{ "script":{"lang":"painless","inline": "ctx._source.CUST_LIST.add(params.cust)","params" : {"cust" : "183569109"}},"upsert" : {"CUST_LIST" : ["183569109"]}}
{"update" : { "_id" : "id_5", "_type" : "register", "_index" : "promotion"} }
{ "script":{"lang":"painless","inline": "ctx._source.CUST_LIST.add(params.iab)","params" : {"cust" : "256570283"}},"upsert" : {"CUST_LIST" : ["256570283"]}}

After running for 20 minutes and updating 22k ( i.e. creating 22k version of a document). I am running out of heap.

[2018-01-23T19:20:42,021][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [] fatal error in thread [elasticsearch[Vxi1yAe][bulk][T#1]], exiting
java.lang.OutOfMemoryError: Java heap space
	at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:210) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:230) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:46) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:250) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:271) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:149) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:796) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:447) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:403) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1571) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
	at org.elasticsearch.index.engine.InternalEngine.update(InternalEngine.java:740) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:603) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:505) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:556) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:545) ~[elasticsearch-5.5.1.jar:5.5.1]

Heap dump shows 385342 instance of object ( roughly 4gb)

|Class|Instance Count|Total Size|
|---|---|---|
|class [B|385342|4072923618|
|class [C|867226|41676034|
|class org.elasticsearch.action.index.IndexRequest|138460|24507420|
|class org.elasticsearch.action.update.UpdateRequest|112640|17909760|

Our stack -
ElasticSearch 5.5.1
Java 8
Linux
Xmx=4G

Where could I be wrong with this ? Any suggestions are welcome


(David Pilato) #2

Did you split the bulk in small parts like updating 100 per 100 docs?


(Shivam Arora) #3

I tried splitting my file into 7k docs each but in that run also I got OOM after 40k update.
Is there any way to set bulk size through CURL or elastic property ?
Thanks


(David Pilato) #4

I tried splitting my file into 7k docs each

Start lower. Try with 100 docs...

Is there any way to set bulk size through CURL or elastic property ?

No. Some clients support that like the BulkProcessor class we provide in Java Rest Client but it's basically your responsability to split your entries into small bulk requests.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.