Result window is too large, from + size must be less than or equal to: [10000] but was [11001]


(Rafael) #1

Hi!

I'm trying to loop through all documents in a certain index using pagination (size,from) once ES limits the total documents returned to 10k, then this is my initial search (what I increase by 10k the from parameter to return the next 10k) :

/index_name/_search?pretty=true&q=*&size=9999&from=0

Problem is that it doesn't allow me to start from > 10k meaning it's not a real pagination where i could start from 240000 and get the next 10k. The error message:

"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [11001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."

I also tried to use scan/scroll where it will search and bring all the results and I would need to loop through the _scroll_id and gather the sections of "size" (size*shards) to have all the documents. This is not a good option because:

  1. will exhaust the memory limits.
  2. Even if I decide to go for it, program stops (I believe because of the 10k limit again).

Thank you and hope somebody know if this is a issue or if this "pagination" should really work only for the first 10k docs.


(David Pilato) #2

Scan & scroll is the way to go if you want to extract data from elasticsearch. I guess you don't want to display to a user the less relevant results, right? So Going after 10k results does not make a lot of sense IMO.

Pagination is really meant to only the first results. Displaying up to 10k results is already a lot IMO.


(Rafael) #3

Hi David,

Thank you for your message, I was thinking about it but still not the best solution for me. Let me explain why.

Using Scan & Scroll I still limited to handle 10k per scroll (calculation here is size*shards = X that has to be less than 10k that is the search limitation even for scan/scroll). Imagine that you want to migrate 650mi documents from one index to another, then you are forced to do so 10k by 10k, it mens 65k loops... it will: 1. take a long time. 2. run out of limit memory (depending on settings).

Also using scroll you have to loop through each _scroll_id then if there is some error in between, let say on the loop number 12230 out of 65k loops then you can't run again and catchup FROM the loop 12230, you will need to start FROM 0 again. it's not save/efficient. When I say I can't start from let's say document number 12230 is because the FROM parameter in the Search only allows me to start under 10k because of this limitation. Seems that the ES pagination doesn't work using FROM above 10k even if the Size is let's say 10 docs. (give me 10 docs but starting from 12k... it doesn't work.)

Maybe ES is not the best solution for me to handle this problem but I decided to ask here once i couldn't find anything more clear in the documentation.


(David Pilato) #4
  1. May be. May be not that longer that you can imagine.
  2. I think it's wrong. The opposite is true. I mean that loading on the coordinated node 10k docs instead of 1m of docs for example will consume much less memory.

Once you get back 10k docs in your application, just stream them to the other process/destination you want.

What kind of error? You mean elasticsearch does not work anymore? Then you need to fix the root of the problem not trying to work around it.
I think you are talking about something going wrong then in your own process, right?
If so, you have to deal with that on your side.

Yes this not how Scroll is working. You never specify the from.

Basically, you run a search request. It returns 10m of docs and you start to scroll the result set from the beginning x docs per x docs.
When you run the next scroll call, it asks for the next x docs you want.

That said, may be this new feature coming in 5.0 could help for your case?

Also this new feature could help to resume a scroll at some point. No date for this feature though. https://github.com/elastic/elasticsearch/issues/20859

I hope this helps.


(Rafael) #5

Hi David,

Thank you for helping.

So I understand:

A. ES Search pagination only works for documents between 0 and 9999, anything above it can't be paginated/accessed using FROM/Size.

B. ES offers Scan & Scroll (same as Search) with the same limitation as Search has but this time it splits documents into junks of smaller size then you can scroll through them and so access all documents in the index. It means that you need to create a manual pagination if you need to get documents above 10k.

I'll adjust my business rules but it's a bit frustrating that I can't retrieve as much docs as I want. I understand that is possible to change it through settings but in some cases as on Amazon it's not available. (as far as I know).

Anyway thank you.


(system) #6