Handle big result set?

linlma · July 7, 2015, 6:16am

Hello Elastic experts,

Suppose a query matches a large volumes of records (saying a few million records), what is the best way to handle (I want to store the results on local disk) the results? Is there a way to streaming big result set as I have the concern the local box memory may not be able to hold all result set?

thanks in advance,
Lin

colings86 · July 7, 2015, 8:18am

Deep pagination can be achieved using the Scan-Scroll feature: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan

linlma · July 7, 2015, 8:44am

Hi Colin,

Good sharing. Looked through the document you referred and find sorting may have cost. My use case is, I just need to find top N results, and do not care the order of results in top N. Wondering in my case, what is the most efficient way to write the query?

regards,
Lin

colings86 · July 7, 2015, 8:54am

What do you mean by this? to define the top N of something there has to be some kind of sorting. How are you defining the top N? do you want the top N scoring documents?

linlma · July 7, 2015, 5:21pm

Yes, Colin, yes, I need top N scored documents, you are correct. I mean I do not need to strict ascending/descending order sort inside the top N documents, as long as top N documents are returned. Any efficient way to implement? Thanks.

regards,
Lin

colings86 · July 7, 2015, 9:06pm

Then the scan-scroll feature is what you want, Just don't set any explicit sorting in the scan request

linlma · July 7, 2015, 9:13pm

Thanks Colin,

I read the document (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan), for statements, "Deep pagination with from and size — e.g. ?size=10&from=10000 — is very inefficient as (in this example) 100,000 sorted results have to be retrieved from each shard and resorted in order to return just 10 results.", I am confused, should it be 10,000 sorted results? Other than 100,000? Which maps to from =10000 parameter?

Please feel free to correct me if I am wrong.

BTW, another quick question is, if I want to use scroll only without scan (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html), could I combine sorting with scroll? And why using scroll is more efficient than ordinary queries?

Topic		Replies	Views
Deep Pagination with scroll(100 millions of docs) could be a problem? Elasticsearch	7	9148	February 11, 2017
Performance impact of returning large result sets Elasticsearch	3	4301	July 5, 2017
Index max_result_window Elasticsearch	7	12430	July 5, 2017
Deep pagination best practices? Elasticsearch	3	1667	July 5, 2017
What about the Scroll API makes it a bad choice for paging large result sets? Elasticsearch	3	30	November 22, 2024

Handle big result set?

Related topics