Sliced scroll with sort

Hello guys.
How does actually Elasticsearch perform sorting under the hood for shard and for index? I can find it neither in documentation or books.

I am going to use a sliced scroll with sort but in some cases, I have to sort the whole shard. I think it will work but I cannot confirm this.
Do you have a best practice for reading a big number of documents with sorting?

For newer versions of Elasticsearch, we suggest to use PIT (point in time) instead of _scroll. PIT also supports slicing .

1 Like

Thanks for your answer!
Could you please clarify do you recommend using PIT instead of the scroll for reading the whole index for example when I want to reprocess them, it may be 1 billion documents? Is it more efficient of memory or performance to use PIT?

Yes, we recommend to use PIT instead of scroll for all cases as per these instructions . And we don't recommend to use scroll any more.

As for advantages of using PIT over scroll there are several:

  • a little less more memory usage. Each scroll stores a search request, as it is per request based. PIT is point in time index, so it doesn't store search requests, thus more search requests can be run at the same time.
  • more resilient, if a node with goes down during a series of PIT requests, an attempt will be made to make it on another node
  • PIT slices should be faster as it based on internal Lucene _doc ids rather than Elasticsearch doc _id field.
  • PIT is a new API that we plan to support for very long, while we may deprecate _scroll.
2 Likes

Thanks for the detailed responce! :+1:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.