Hello guys.
How does actually Elasticsearch perform sorting under the hood for shard and for index? I can find it neither in documentation or books.
I am going to use a sliced scroll with sort but in some cases, I have to sort the whole shard. I think it will work but I cannot confirm this.
Do you have a best practice for reading a big number of documents with sorting?
Thanks for your answer!
Could you please clarify do you recommend using PIT instead of the scroll for reading the whole index for example when I want to reprocess them, it may be 1 billion documents? Is it more efficient of memory or performance to use PIT?
Yes, we recommend to use PIT instead of scroll for all cases as per these instructions . And we don't recommend to use scroll any more.
As for advantages of using PIT over scroll there are several:
a little less more memory usage. Each scroll stores a search request, as it is per request based. PIT is point in time index, so it doesn't store search requests, thus more search requests can be run at the same time.
more resilient, if a node with goes down during a series of PIT requests, an attempt will be made to make it on another node
PIT slices should be faster as it based on internal Lucene _doc ids rather than Elasticsearch doc _id field.
PIT is a new API that we plan to support for very long, while we may deprecate _scroll.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.