Performance - filter order and has_child filter


#1

Hi, I have some questions about how to improve the performance of a query that has a has_child condition.
To try to improve the performance I'm adding some conditions before but I'm getting weird results.

For example when I execute this query:

{
  "query" : {
    "filtered" : {
      "query" : {
        "match_all" : { }
      },
      "filter" : {
        "bool" : {
          "must" : [ {
            "range" : {
              "createTime" : {
                "from" : "now-1h",
                "to" : null,
                "include_lower" : false,
                "include_upper" : true
              }
            }
          }, {
            "has_child" : {
              "filter" : {
                "term" : {
                  "track.userId" : "1"
                }
              },
              "child_type" : "track"
            }
          } ]
        }
      }
    }
  }
}

It takes around 2000 ms.
When I remove the has_child condition (keeping only the range condition) the query returns 1950 documents in 2 ms. So I understand (but not sure) that running the has_child over 1950 documents takes almost 2 seconds.

But If I change the from in the range condition to now-7d, the query takes around 2100 ms even though the range condition returns ~350000 (in ~10ms).
So it seems that filtering documents before the has_child doesn't affect the performance that much. Is this correct, or am I doing something wrong?

Is there a way to debug the performance of a query (I mean something that shows where the time is spent)?

BTW, I'm using the elasticsearch version 1.7.2

Thanks, Claudio.


(Adrien Grand) #2

Indeed, very often for such queries, the bottleneck is due to the time it takes to join parents with children. There is nothing that can really be improved besides modeling data in such a way that parent/child relations are not necessary.


(system) #3