How does "size" work under the hood?


(x0ne-2) #1

I am trying to figure out the best way to issue my queries as to not flood
the heap with data I may not care about. Before each query, I do a count
search type to identify how many results I am potentially dealing with.
When I specify a "size" in my search query, how exactly does that impact
results and the heap? If I run a query that matches 50k documents and I am
only interested in 25 (specified by size), are all 50K still loaded into
memory? Is there a way to get just the top 25 results off the query match
without loading all hits into memory or is that how size actually works?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9837a8e5-ddf6-4684-8fe0-dd6909bcee48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #2

The first count query might not be needed because a normal query comes back
with a count anyway.

The size parameter ultimately translates into the size of a min heap.
Worst case scenario space is just the size parameter times number of
shards. Worst
case scenario time is log(size) * number of documents matched. If size is
much smaller then number of documents matched then the average case tends
to end up being more related to the number of documents matched then the
size because most matches aren't better then whatever is collected in the
heap. At some point the heap has to be sorted, sent back to the one node,
merged, then fetched. The fetch may end up being slower then all of the
rest of it. And may end up using more memory if you are loading source.

Nik

On Fri, Jul 11, 2014 at 10:32 AM, x0ne brandon.s.dixon@gmail.com wrote:

I am trying to figure out the best way to issue my queries as to not flood
the heap with data I may not care about. Before each query, I do a count
search type to identify how many results I am potentially dealing with.
When I specify a "size" in my search query, how exactly does that impact
results and the heap? If I run a query that matches 50k documents and I am
only interested in 25 (specified by size), are all 50K still loaded into
memory? Is there a way to get just the top 25 results off the query match
without loading all hits into memory or is that how size actually works?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9837a8e5-ddf6-4684-8fe0-dd6909bcee48%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9837a8e5-ddf6-4684-8fe0-dd6909bcee48%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3TJV-ZQ-wZMrfMR%2B9EQV_V48onfCCmqr9P64T5eusSyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3