Performance impact of returning large result sets

nikouni · October 1, 2015, 10:38pm

Hello.

I would like to use Elasticsearch as a way to retrieve the top 1000-10000 documents for a query and then applying some post-processing to those documents.

The problem is that performance seems to drop a lot when incrementing the size of the result set from 10 to 1000 and 10000.

To measure performance I am using simple queries like this:

 GET /index/type/_search
{
   "fields":[],
   "query" : {
      "filtered" : { 
         "filter" : {
            "bool" : {
              "must" : [
                 { "term" : {"title" : "hello"}}, 
                 { "term" : {"title" : "elasticsearch"}} 
              ]
           }
         }
      }
   },
   "size": 10
}

I am only interested in the documents ids.

After running 1000 queries from the JAVA API I got the following response times on average:

size = 10 -> 22 miliseconds
size = 100 -> 179 miliseconds
size = 1000 -> 288 miliseconds
size = 10000 -> 483 miliseconds

I do not understand why there is so much difference. Since I am only asking for the id of the documents, there should not be much overhead of fetching documents from disk.

Using scroll does not seems to provide much better results.

SOME ADDITIONAL INFORMATION

I am running ES and the tests on the same computer.
I have 16gb of memory.
I am running ES as: ./elasticsearch -Xmx8g -Xms8g
Documents in index: 19 million
Size of the index: 3gb
Shards configuration: default

Thanks a lot for the help!

softwaredoug · October 2, 2015, 1:01am

You might want to learn about deep paging. Under the hood, Elasticsearch needs to build a ranked set of results on each shard. So asking for 10K results, for normal results, means each shard returning 10K results to the node handling the search. This node must then sort through all 10K, throwing out most of the results to return the result set for you.

That all being said, you're using filters. I'm not sure if that matters. But you can actually take advantage of another feature to pull back large results sets: the scan and scroll API. This is probably a better feature for you. I'd experiment with that to see if it was more appropriatte for your problem.

nikouni · October 2, 2015, 4:04am

Hi, softwaredoug and thank you very much for your answer.

I understand the problem with deep paging but as you said, I am just using filters and there is no sorting required. I have also tried using scan and scroll. It is a little faster but still too slow for what I am trying to achieve.

Topic		Replies	Views
ES is slow when I try to return a huge result set Elasticsearch	8	4223	July 6, 2017
Searching 1M document from 10M documents Elasticsearch	2	840	July 5, 2017
Number of returned results and search time Elasticsearch	12	639	July 6, 2017
Looking for a best practice to get all data according to some filters Elasticsearch	10	498	July 6, 2017
Search Performance Elasticsearch	9	372	July 6, 2017

Performance impact of returning large result sets

Related topics