Aggregation of more than 10000 records

sanjeebkdeka · August 22, 2018, 2:46pm

Hi,

My requirement needs retrieval of at least 10000 matching distinct entries from Elasticsearch. Each distinct entry could in turn be referring to multiple records grouped by a particular field. Lets say, at an average each bucket would have a document count of 3 resulting in total hits of 30000.

How do i retrieve all the 30000 records without changing max-result-window?

Thanks for any help.

kigonya · August 22, 2018, 9:32pm

I think the better question is why don't you want to reset max_result_window. Realize that you "also" access to this setting via settings... just in case you think you are limited by a third party like AWS. Plus you can also reset without recreating index I believe. I'm not answering your direct question... but just offering options in case you missed something.

{
	"settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0,
        "max_result_window": 50000, <==========
		"analysis": {...}
	},
	"mappings": {...}
}

Bernt_Rostad · August 23, 2018, 4:45am

That may work well and it may not.

Personally I've had some bad experiences doing that. In a use case I was pressed into increasing the max_result_window to a hundred thousand because of export requirements and ended up with a cluster that would hang from time to time, even crashing at one point. In the end I managed to sell in the idea of using a scroll to retrieve the hundred thousand or more results. The cluster has been stable since then.

kigonya · August 23, 2018, 3:45pm

MOST definitely you want to be smart about it.... Especially for Exports. Never allow a request for all. I believe my max is 1,000 for heavy objects and 10,000 for light objects. Let the API (sitting between ES and the API user) handle the maximum count... at least that is what IU did.

So... you can set max_result_window to 1,000,000 but of course there are other issues like timeouts if you request all million at once. You don't want to do that.

Setting it to one million will allow you to "paginate" all the way to 1 million. Even as you request 100 at a time. Otherwise if you have 1 million records you wont be able to paginate past the default 10,000 if you don't adjust this setting. you won't be able to request for 100 starting from 999,990... you will get an error.
For things like exports, don't request all at once. You want to "paginate" this time, IN THE BACK END. I believe its the scroll feature of elasticsearch (this was done by another developer on our team so I won't take credit for her work...lol)

But YES.... you are experiencing the same issues we had early this year and already resolved.

Good luck. Let me know if you have more questions.

kigonya · August 23, 2018, 3:58pm

I guess you said you already use Scroll, we just don't use it simple Front End pagination. We use it only for exports.

system · September 20, 2018, 3:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Retrieving more than 10K documents Elasticsearch	3	390	July 3, 2018
Paginating result set greater than 10000 (with aggregations) - Possible options Elasticsearch	7	18526	March 23, 2018
Max Window Size is Set to 10000 but the Terms aggregations is giving single filter value larger than max window size. Elasticsearch	12	926	January 16, 2024
Max_result_window Elasticsearch	6	1310	July 23, 2017
Aggregation with large data set using bucket aggregation Elasticsearch	2	265	December 14, 2021

Aggregation of more than 10000 records

Related topics