I am using the UpdateByQuery with InlineScript, but it is working for only 30000-40000 items while updating, and for 100000+ items it is giving the connection time out exception and only updating 14000 items.
i want to work with update query with Inline scripting for large data, but UpdateQuery is not working with large data as 100000+ documents to update.
Also tried the timeout field of request and gave the 20 minutes, and wait for completion flag=true, still not worked.
Elastic version : 7.17.9
nodes in cluster : 1
CPU:
RAM:32GB
assigned heap max size:8
shard size:
shard count:18
average size of the documents need to update: 200000+ to even more
yes nested property present
I asked about the average size of the documents (kB), not the count. As you are using nested mappings, how many nested documents are there on average per document?
How many nested documents does such a large document contain on average?
Note that updating vary large documents with large number of nested documents results in a lot of overhead as the full document need to be reindexed. Each nested document is behind the scenes stored as a separate document, which also adds to the overhead.
If the numbers are correct and that is your average document size I would recommend you revise how you index and manage your data.
must be a typo, 87MB per doc, update 100k of them at a time, doesn't fit on a 476GB disk.
Seems so.
@rutuja Can you outline a little your use case here? This seems an unusual way of maintaining data in elasticsearch. What (in. general terms) is the data, why does it require updating after indexing, why you need to update so many docs at once, ... Maybe if we understood the use case better, another idea might present itself.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.