Search_after but with paging

We know the from + size parameters in a search query are limited to 10000. In a search which returns 20k hits, and assuming each page is 1k results, it's impossible to retrieve any page beyond #10 (like page 11, from set to 10000, size set to 1000), cause then we get the error

"Result window is too large, from + size must be less than or equal to: [10000] but was [xxxx]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."

The only alternative according to the documentation is to use the search_after parameter.
Now say I wanna browse directly to page 11, using the search_after, and without browsing through the first 10 pages. Is that even possible ?

I don't wanna have to call the repo 10 times, each one with the search_after set to the last hit on the previous page, just to get to page 11... Imagine if I have 100 pages, and I wanna see a random one, say page 67. That's 66 calls, always setting the right search_after.. it's absurd.

Is there really no another way ?

Welcome!

It's often a question of user needs.
Do a user really need to go to page 67 to find the relevant information?

How often are you going to page 67 on google?

Instead, I'd propose a better approach with faceted navigation for example.

Could you describe your use case?
So may be it will help to give you a better help...

How can one guarantee a user will not want to browse to page 67 or any other? Not that that is really relevant.., if I had said I had 10.000 hits per page, the issue will still arise on page 11. Would it be fair now to ask if a user really needs to go to page 11 to find the relevant information?

But as for a use case, imagine an article browsing webapp developed in Angular. To display the items, a dataview is used, with a paginator, which means something like this will be available to the user:

image

So if the user selects any one page n at random from the dropdown, the elasticsearch repo will be queried n-1 times, with n-1 different search_after values, before retrieving the actual hits from page n.
And yes, the app provides several filters to help narrow the search, but that again does not solve the issue of "I wanna see ALL results".

If you want to see all the results, it means that you have a lot of time to waste in my opinion.

Or you want to export all the results to another application which is a different story.

But seriously, how often do you click on next page on Google?

The use case you described is not an end user story but more a technical approach lead by technical constraints IMO.

Could you tell more about:

  • what kind of data are you searching for?
  • how does look like the search bar?
1 Like

The kind of data is documents (files) with associated metadata input by users (titles, categories, associated users, and other things), if it matters at all. It could have been movies, or works of art. It's pretty irrelevant.

The search bar, again something else which doesn't matter, is just a regular input text box. The terms that are input there will be searched through a variety of metadata fields on the elasticsearch repo, configured by the multimatch query, with associated weights/boost values. Again doesn't matter because the user may want to not search for anything in particular, and just browse the provided pages.

For the record, the repo currently has more than 1 million records, so yes, browsing only the first 10k, regardless of wether the user searched a term or not, makes no sense.

What you're telling me is that, since elasticsearch FORCES me to limit the results to 10k hits, I MUST not provide a way for users to browse more than 10k results. And what's more dumb is that I was using v1.7 just fine, and that was not even an issue.

As a developer, no one can guarantee a random user will not click on a random page outside of range, but now Im enforced to because again the elasticsearch stack enforces me to.

I just deployed a version of the application, knowingly that only the first 10 pages (1000 hits each) will work, and voilà, someone clicks on that double-arrow-right (go to last page), and an error occurs, which is expected.
image

Could you please provide an alternative solution rather than finding loopholes in applications only to fit the features elasticsearch provides?
Again, what I look for is a way to browse directly to the n-th page, when the search returns more than 10k results. Thank you.

You can use:

  • the size and from parameters to display by default up to 10000 records to your users. If you want to change this limit, you can change index.max_result_window setting but be aware of the consequences (ie memory).
  • the search after feature to do deep pagination.
  • the Scroll API if you want to extract a resultset to be consumed by another tool later. (Not recommended anymore)

And what's more dumb is that I was using v1.7 just fine, and that was not even an issue.

Let me highlight this part:

but be aware of the consequences (ie memory)

So before recent versions, we were not protecting users enough to shot themselves in the foot. It's not the case anymore by default.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.