I am trying to figure out the best way to issue my queries as to not flood
the heap with data I may not care about. Before each query, I do a count
search type to identify how many results I am potentially dealing with.
When I specify a "size" in my search query, how exactly does that impact
results and the heap? If I run a query that matches 50k documents and I am
only interested in 25 (specified by size), are all 50K still loaded into
memory? Is there a way to get just the top 25 results off the query match
without loading all hits into memory or is that how size actually works?
The first count query might not be needed because a normal query comes back
with a count anyway.
The size parameter ultimately translates into the size of a min heap.
Worst case scenario space is just the size parameter times number of
shards. Worst
case scenario time is log(size) * number of documents matched. If size is
much smaller then number of documents matched then the average case tends
to end up being more related to the number of documents matched then the
size because most matches aren't better then whatever is collected in the
heap. At some point the heap has to be sorted, sent back to the one node,
merged, then fetched. The fetch may end up being slower then all of the
rest of it. And may end up using more memory if you are loading source.
I am trying to figure out the best way to issue my queries as to not flood
the heap with data I may not care about. Before each query, I do a count
search type to identify how many results I am potentially dealing with.
When I specify a "size" in my search query, how exactly does that impact
results and the heap? If I run a query that matches 50k documents and I am
only interested in 25 (specified by size), are all 50K still loaded into
memory? Is there a way to get just the top 25 results off the query match
without loading all hits into memory or is that how size actually works?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.