Suppose a query matches a large volumes of records (saying a few million records), what is the best way to handle (I want to store the results on local disk) the results? Is there a way to streaming big result set as I have the concern the local box memory may not be able to hold all result set?
Good sharing. Looked through the document you referred and find sorting may have cost. My use case is, I just need to find top N results, and do not care the order of results in top N. Wondering in my case, what is the most efficient way to write the query?
What do you mean by this? to define the top N of something there has to be some kind of sorting. How are you defining the top N? do you want the top N scoring documents?
Yes, Colin, yes, I need top N scored documents, you are correct. I mean I do not need to strict ascending/descending order sort inside the top N documents, as long as top N documents are returned. Any efficient way to implement? Thanks.
I read the document (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan), for statements, "Deep pagination with from and size — e.g. ?size=10&from=10000 — is very inefficient as (in this example) 100,000 sorted results have to be retrieved from each shard and resorted in order to return just 10 results.", I am confused, should it be 10,000 sorted results? Other than 100,000? Which maps to from =10000 parameter?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.