My requirement needs retrieval of at least 10000 matching distinct entries from Elasticsearch. Each distinct entry could in turn be referring to multiple records grouped by a particular field. Lets say, at an average each bucket would have a document count of 3 resulting in total hits of 30000.
How do i retrieve all the 30000 records without changing max-result-window?
I think the better question is why don't you want to reset max_result_window. Realize that you "also" access to this setting via settings... just in case you think you are limited by a third party like AWS. Plus you can also reset without recreating index I believe. I'm not answering your direct question... but just offering options in case you missed something.
Personally I've had some bad experiences doing that. In a use case I was pressed into increasing the max_result_window to a hundred thousand because of export requirements and ended up with a cluster that would hang from time to time, even crashing at one point. In the end I managed to sell in the idea of using a scroll to retrieve the hundred thousand or more results. The cluster has been stable since then.
MOST definitely you want to be smart about it.... Especially for Exports. Never allow a request for all. I believe my max is 1,000 for heavy objects and 10,000 for light objects. Let the API (sitting between ES and the API user) handle the maximum count... at least that is what IU did.
So... you can set max_result_window to 1,000,000 but of course there are other issues like timeouts if you request all million at once. You don't want to do that.
Setting it to one million will allow you to "paginate" all the way to 1 million. Even as you request 100 at a time. Otherwise if you have 1 million records you wont be able to paginate past the default 10,000 if you don't adjust this setting. you won't be able to request for 100 starting from 999,990... you will get an error.
For things like exports, don't request all at once. You want to "paginate" this time, IN THE BACK END. I believe its the scroll feature of elasticsearch (this was done by another developer on our team so I won't take credit for her work...lol)
But YES.... you are experiencing the same issues we had early this year and already resolved.
Good luck. Let me know if you have more questions.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.