I have an with ~ 1.500.000 (million) documents or so. I only want to get 1.000 results from it, but I want to start counting backwards. So, I want to retrieve documents 1.499.000 (- 1.000) through 1.500.000. I've set 'from' to 1499000 and 'size' to 1000. from + size is therefore the total amount of documents: 1.500.000. This, expectedly, causes:
elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [1474810]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')
We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).
Should I also be using PIT for retrieving just a few results, but starting at a high from?
my gut feeling here is, that even though you could switch from a regular query to scroll search/PIT/search_after ,maybe the query itself could be improved? If you tell more about the use-case that might help.
Could you change the sorting strategy or filtering to retrieve the required documents instead of paginating through them?
Your question is not very clear,Suppose you have 10000 documents:
if you want get last 100 docs(9900~10000) sort with asc,
can you use desc to sort and get Top100(1~100)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.