Simple Search Query Performance


(sarathrs) #1

I have elasticsearch index with single shard which contains 1.6 million documents. Host machine - 8G , Elasticsearch 3.5G.

  • When i run a simple match query on fieldA which has same value in entire document set (1.6 million) it is taking averagely 60ms.

  • When i run a simple match query on fieldB which has same value in 20 documents, it is taking averagely 2ms

I have set size as 0 and _source as false for both of above queries. Field A and B are just fields with array of strings, with mapping of lowercase filter and keyword tokenizer. Can someone explain why there is such a difference in execution time. I am thinking in both cases it first consult the inverted index to find the matched documents and return back the top document. Only difference i could see is total count. Will calculating the total hits causing the difference?

Elasticsearch Version : 1.3.2

Thanks for help in advance

Sarath


(Zachary Tong) #2

The slowness is indeed due to matching so many documents. Elasticsearch has to run through all the matches to calculate a score for each...only after all the matching documents have been scored can it accurately return the top N results.

So in the case where 1.6 million match...ES has to run through all 1.6m to calculate a score for each, then rank those. Which is why it is considerably slower than the 20-doc case.

You'll see this a lot with ES queries: the more exclusive the query is (e.g. the fewer the matching docs) the faster they tend to be, because the sparsity of matches allows for more optimizations and less work in general.


(sarathrs) #3

Thanks for the explanation


(system) #4