Search_after vs deep pagination

Documentation suggests that search_after is suggested compared to deep pagination, but doesn't seem to explain why. Or at least I didn't understand the details.

Could someone explain why we should use search_after vs deep pagination?

So as I understand it, the issue with deep pagination using from + size is that Elasticsearch will load all documents into memory up to the page you ask for, then sort them and return the slice you asked for. This is the result window. The memory needed per request is proportional to from + size, not just size. So it's not good for paging through large result sets because the more results you skip, the slower your query becomes.

Scroll/search after let you do deep scrolling where you only fetch the next result set in an efficient manner, but can't jump between random pages. It solves the scaling problem of from + size. Scroll needs to maintain a context per scroll though, which isn't great for lots of users doing lots of scrolling.

Search after is like scroll, but stateless. So there isn't any additional state stored on the Elasticsearch end, so you don't have the scroll context scaling issue and you don't have the from + size scaling issue. It does share the constraint that you can only page forwards.

Usually I work around the random paging issue through user experience. Most people don't usually want to randomly page through results. They search, scroll a bit, filter, scroll a bit more, rinse-and-repeat.

If scaling isn't a problem for you though then from + size is a really simple solution to paging.

6 Likes

Thank you for the response. This basically means search_after and scroll are exactly the same except the fact that search_after is stateless.

If the query includes paginating to page 500 with a page size of 100 along with sort on a couple of fields and a few filters, would ES still sort and load all documents to memory and only return the requested size?

Scaling definitely is a problem. Hitting a page post 5.4M leads to OOM (I tried increasing the default 10k to 15M for testing purpose).

There's probably more subtlety to it than that, someone else who knows more might come along and shed some more light on it :slight_smile: As far as the end-user is concerned though I think it's safe to say they operate in the same way.

That's right, it will run your filters to determine which documents match, collect them, sort them and then only return the documents after from. Which as you pointed out doesn't scale so well. That's why the limit is 10,000 so you get an error suggesting you look for alternative solutions than OOM, which is bad news for your node.

So, just to confirm, if ES runs filters and sorts the documents with a request for a document after 5M, wouldn't it possibly lead to OOM? I am just looking for a good solution for deep pagination with filters and sorts included in the query.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.