Result window is too large, from + size must be less than or equal to: [10000] but was [11001]

toxaco · October 14, 2016, 4:39pm

Hi!

I'm trying to loop through all documents in a certain index using pagination (size,from) once ES limits the total documents returned to 10k, then this is my initial search (what I increase by 10k the from parameter to return the next 10k) :

/index_name/_search?pretty=true&q=*&size=9999&from=0

Problem is that it doesn't allow me to start from > 10k meaning it's not a real pagination where i could start from 240000 and get the next 10k. The error message:

"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [11001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."

I also tried to use scan/scroll where it will search and bring all the results and I would need to loop through the _scroll_id and gather the sections of "size" (size*shards) to have all the documents. This is not a good option because:

will exhaust the memory limits.
Even if I decide to go for it, program stops (I believe because of the 10k limit again).

Thank you and hope somebody know if this is a issue or if this "pagination" should really work only for the first 10k docs.

dadoonet · October 14, 2016, 9:17pm

Scan & scroll is the way to go if you want to extract data from elasticsearch. I guess you don't want to display to a user the less relevant results, right? So Going after 10k results does not make a lot of sense IMO.

Pagination is really meant to only the first results. Displaying up to 10k results is already a lot IMO.

toxaco · October 15, 2016, 7:47am

Hi David,

Thank you for your message, I was thinking about it but still not the best solution for me. Let me explain why.

Using Scan & Scroll I still limited to handle 10k per scroll (calculation here is size*shards = X that has to be less than 10k that is the search limitation even for scan/scroll). Imagine that you want to migrate 650mi documents from one index to another, then you are forced to do so 10k by 10k, it mens 65k loops... it will: 1. take a long time. 2. run out of limit memory (depending on settings).

Also using scroll you have to loop through each _scroll_id then if there is some error in between, let say on the loop number 12230 out of 65k loops then you can't run again and catchup FROM the loop 12230, you will need to start FROM 0 again. it's not save/efficient. When I say I can't start from let's say document number 12230 is because the FROM parameter in the Search only allows me to start under 10k because of this limitation. Seems that the ES pagination doesn't work using FROM above 10k even if the Size is let's say 10 docs. (give me 10 docs but starting from 12k... it doesn't work.)

Maybe ES is not the best solution for me to handle this problem but I decided to ask here once i couldn't find anything more clear in the documentation.

dadoonet · October 15, 2016, 11:41am

May be. May be not that longer that you can imagine.
I think it's wrong. The opposite is true. I mean that loading on the coordinated node 10k docs instead of 1m of docs for example will consume much less memory.

Once you get back 10k docs in your application, just stream them to the other process/destination you want.

What kind of error? You mean elasticsearch does not work anymore? Then you need to fix the root of the problem not trying to work around it.
I think you are talking about something going wrong then in your own process, right?
If so, you have to deal with that on your side.

Yes this not how Scroll is working. You never specify the from.

Basically, you run a search request. It returns 10m of docs and you start to scroll the result set from the beginning x docs per x docs.
When you run the next scroll call, it asks for the next x docs you want.

That said, may be this new feature coming in 5.0 could help for your case?

github.com/elastic/elasticsearch

Add the ability to partition a scroll in multiple slices.

elastic:master ← jimczi:scroll_by_slice

opened 10:21AM - 10 May 16 UTC

jimczi

+1462 -63

API: ``` curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{ …"slice": { "field": "_uid", <1> "id": 0, <2> "max": 10 <3> }, "query": { "match" : { "title" : "elasticsearch" } } } ``` <1> (optional) The field name used to do the slicing (_uid by default) <2> The id of the slice By default the splitting is done on the shards first and then locally on each shard using the _uid field with the following formula: `slice(doc) = floorMod(hashCode(doc._uid), max)` For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard. Each scroll is independent and can be processed in parallel like any scroll request. Closes #13494

Also this new feature could help to resume a scroll at some point. No date for this feature though. Introduce a new `_last_modified` metafield · Issue #20859 · elastic/elasticsearch · GitHub

I hope this helps.

toxaco · October 18, 2016, 1:40pm

Hi David,

Thank you for helping.

So I understand:

A. ES Search pagination only works for documents between 0 and 9999, anything above it can't be paginated/accessed using FROM/Size.

B. ES offers Scan & Scroll (same as Search) with the same limitation as Search has but this time it splits documents into junks of smaller size then you can scroll through them and so access all documents in the index. It means that you need to create a manual pagination if you need to get documents above 10k.

I'll adjust my business rules but it's a bit frustrating that I can't retrieve as much docs as I want. I understand that is possible to change it through settings but in some cases as on Amazon it's not available. (as far as I know).

Anyway thank you.

Topic		Replies	Views
One shot search query by scroll from index > 10'000 Elasticsearch	6	2381	March 23, 2020
Achieving Pagination in Elasticsearch using Scroll Elasticsearch	2	1449	February 20, 2017
Search 1M data in elasticsearch using pagination Elasticsearch	2	1266	July 2, 2017
How to retrieve more than 10,000 results Elasticsearch	6	3219	July 19, 2018
Get documents starting at a high from Elasticsearch	9	1024	March 5, 2022

Result window is too large, from + size must be less than or equal to: [10000] but was [11001]

Related topics