Boolean must_not Query is having slow performance

(lubp123) #1

Elasticsearch version:
version : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"

Plugins installed: []

JVM version: "1.8.0_102"

OS version: OSX El Capitan 10.11.6

Description of the problem including expected versus actual behavior:

when we do a NOT based query on a large dataset of 300 million rows the time it takes to get data is very slow.
time taken: 12 minutes
number of rows: 300 million
returning roughly 80% of the data.
Index fields are all analyzed

curl -XGET 'http://localhost:9200/cars/item/_search?size=200' -d '
{ "profile": true, "query": { "bool": { "must_not": [ { "match": { "color": "red" } }, { "match": { "description": "car" } } ] } }, "aggs": { "description": { "terms": { "field": "description", "size": 100 } } }} ' 

Steps to reproduce:

  1. Do a simple curl command of a query in ES
  2. Observe the time it takes from the profile information

(Adrien Grand) #2

Phohibited clauses (MUST_NOT) are indeed more costly than required clauses (MUST, FILTER) since the inverted index can barely help.

(Adrien Grand) #3

In your case, I suspect the time it takes to compute the response is mainly due to the amount of data: how long does it take to run a simple match_all query?

(lubp123) #4

hello Adrain

Thank you for your insight.
I did a Match All query on the data and it returns in 2.5 minutes. But this is still significantly less than the must_not

(system) #5