Environment
.Net 5
Elasticsearch.Net.Aws 7.1.0 : NuGet Gallery | Elasticsearch.Net.Aws 7.2.2
Low level client
Problem
Even with pagination, Elasticsearch's query API does not support more than 10_000 records by default. I.e. if the sum of from
and size
> 10_000 the API throws an error.
Potential solutions
Increase size
I can increase the index's max_result_window
as described here. However I am expecting a large dataset in production - probably less than 10_000_000 records at one time, but for obvious reasons I don't believe that simply increasing the window size is a good idea. My use-case does not require over-the-top performance, but it has to be reasonable for both the end-user and the AWS bill.
What do you think? What leeway do I have regarding to max_result_window
setting?
Track total hits
I've read about track_total_hits parameter - It only returns the correct amount of total hits on each request, but still does not allow records after the 10_000th to be fetched
Scroll API
I've read about the Scroll-API - it's being deprecated currently, so I'd like to avoid it.
Search after
I've read about the search_after parameter - the concept is to define a consistent sort criteria and call exact query for each page, the only difference being is the value of search_after
, which for every subsequent search should be the sort
value returned of the last hit in the previous search.
As far as I can tell this is the recommended solution, but while it may work for large page sizes, I'm having difficulty understanding how it solves the basic paging case:
Lets say we have 20_000 records total, page size is 10, hense 2_000 pages. How can I return the last page, containing records 19_990-20_000? Unless I misunderstand, search_after
does not help, because I've skipped pages and I don't have the sort
value of record number 19_989.
Further more, per the docs:
If provided, the from argument must be 0 (default) or -1
This means that I cannot use a combination of both:
- Perform one search with
"from": "990"
- Use the last record's
sort
value to perform a second search, again using a"from": "990"
- Return the results of the second search.
Beyond that I cannot figure out another way to use it. Could you tell me where I'm getting it wrong?