Get documents starting at a high from

Hi,

I have an with ~ 1.500.000 (million) documents or so. I only want to get 1.000 results from it, but I want to start counting backwards. So, I want to retrieve documents 1.499.000 (- 1.000) through 1.500.000. I've set 'from' to 1499000 and 'size' to 1000. from + size is therefore the total amount of documents: 1.500.000. This, expectedly, causes:

elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [1474810]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')

The scroll API reference @ Scroll API | Elasticsearch Guide [7.17] | Elastic says:

We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).

Should I also be using PIT for retrieving just a few results, but starting at a high from?

my gut feeling here is, that even though you could switch from a regular query to scroll search/PIT/search_after ,maybe the query itself could be improved? If you tell more about the use-case that might help.

Could you change the sorting strategy or filtering to retrieve the required documents instead of paginating through them?

Thanks for your reply.

The use case is as follows: I have a sort order and a limit. When the sort order is ascending, I want to inverse the limit. So:

  • If I have 10.000 documents, the sort order set to ascending and the limit set to 1, I will get document 9.999
  • If I have 10.000 documents, the sort order set to descending and the limit set to 1, I will get document 1

In other words: when the sort order is ascending, from is set to document count minus limit and size is set to the limit.

FWIW: when I refer to 'limit', I actually mean the Elasticsearch concept of 'size'.

Your question is not very clear,Suppose you have 10000 documents:
if you want get last 100 docs(9900~10000) sort with asc,
can you use desc to sort and get Top100(1~100)?

Yes, I could, but they'd be in the wrong order.

E.g. when I want documents 7 and 8 (in that order):

DESC: 8, 7, 6, 5
ASC: 5, 6, 7, 8

DESC will give me the documents I need, 8 and 7, but in the wrong order.

I could of course drop the from, sort DESC and reverse() the results in Python ...

Oh,I see,I may choose to get the docs and reverse it by myself

Because it's expensive to use from+size ,and it's not appropriate to use scroll and PIT for your needs......

Thanks, that's what I thought (and why I asked the question :slight_smile: ). I'll stick around for a bit to see if anyone else has any ideas.

I could of course drop the from, sort DESC and reverse() the results in Python ...

I've solved it this way. Thanks for your replies, @casterQ and @spinscale!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.