How does ES perform if the query does boolean operation on nested types


(Deepak) #1

I would like to know, if there will be any difference in performance between these queries? Both queries brings up the same result.

  1. OR operation in a nested type between two fields.
    nested {
    path : "blah"
    should : {
    match : {"k1" : "v1"},
    match : {"k2" : "v2"}
    }

and

  1. OR operation between two nested type fields.
    should [{
    nested {
    "path" : "blah"
    match {
    "k1" : "v1"
    },
    {
    nested {
    "path" : "blah"
    match {
    "k2" : "v2"
    },
    }]

Lets say the query happens on millions of records.


(Mayya Sharipova) #2

You can use _valide API with an explain option to see what your queries translate into.

GET my-index/my-type/_validate/query?explain=true

For the 1st query, I have got:

"explanation": "+ToParentBlockJoinQuery (my-path.k1:v1 my-path.k2:v2) 
#DocValuesFieldExistsQuery [field=_primary_term]"

For the 2nd query, I have got:

"explanation": "
+(ToParentBlockJoinQuery (my-path.k1:v1) ToParentBlockJoinQuery (my-path.k2:v2)) 
#DocValuesFieldExistsQuery [field=_primary_term]"

Then theoretically we can reason, if there is a big overlap between documents returned from match : {"k1" : "v1"} and match : {"k2" : "v2"}, then 1st query should execute faster.

For example, if:
match : {"k1" : "v1"} produces 10k docs
match : {"k2" : "v2"} produces 10k docs
should : { match : {"k1" : "v1"}, match : {"k2" : "v2"}} produces 15k docs

Then for the 1st query: ToParentBlockJoinQuery execute on 15k docs.
For the 2nd query: ToParentBlockJoinQuery execute on 10k docs for k1, then another ToParentBlockJoinQuery will execute on 10k docs for k2, which overall will be slower.

But this is all theoretical, and the best way to measure the difference performance is to run experiments and measure performance based on real data.


(Mikhail Khludnev) #3

Complexity is the same, the difference should insignificant.