Reduce update_by_query memory use

Hi.

In one of the use cases I currently have for Elastic I have an index with relatively large documents. Initially, we chose to go with update_by_query. However, we recently had an OOM problem caused by one of those updates, and when checking the stack I saw that the full document was being loaded in memory.

I have tried to solve this by adding a _source filter to the update_by_query, but that causes every field not included in the source filter to go away on reindexing. Is there any way to reduce the number of fields loaded in the hits without deleting the excluded fields from the original document?

Thanks!

Elasticsearch need to reindex the whole document even if your processing or criteria only touches a few fields, so I believe the answer is no.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.