On Thu, Apr 10, 2014 at 11:13 PM, Nikolas Everett firstname.lastname@example.org wrote:
This one is easy. Elasticsearch/lucene has to keep a min heap of all the
documents you find and the score that is from + size big. Technically it
is min(from + size, max(rescore_window_size)). Anyway, that means some
part of the query has O(n) space and O(n * log(n)) time complexity where n
is from + size. That part might be dwarfed by some other action but it is
there. And technically in the worst case the time complexity is more like
O(hits * log(n)) but thats not likely.
Everything that Nikolas said is correct. I'd like to add that starting with
Elasticsearch 1.2.0, paging with scroll is going to be more efficient
since the worst case will be O(hits * log(size)) instead of O(hits *
log(from + size)). If you are interested in why it is possible, the reason
is that on each shard, scroll is going to keep track of the least document
that is part of the hits of the previous page, so that you can just ignore
documents that compare greater than this document instead of adding them to
the priority queue.
The issue with realtime is that it creates lots of segments that usually
get merged very quickly. On the other hand, scroll works by asking the
shard to keep open the view over the index that was used for the first
page, until the scroll is closed. This can delay space reclamation and
force Elasticsearch to keep a significant number of files open (beware of
going out of file descriptors).
If you have important search traffic, I would recommend not to use scroll
for every user because of its cost. It is usually a better idea to just
increase the from parameter and prevent your users from performing deep
paging since it might kill your cluster. (If you go to any web search
engine, you'll see that even if they tell us that your query matched
millions of documents, they only allow you to get hits for a few tens of
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.