Is there a way to enforce the execution order of filters?

I finally got around to profile some slow running parts of our application and I found a query that is taking way too long and which also creates a good chunk of our cluster load. Profiling the query in Kibana revealed that most shards finish well below 10ms, but there are almost always a few random shards that take 10 seconds + (most of the time spent in the GlobalOrdinalsQuery.build_scorer).

I suspect that the has_parent query is run before the way more restrictive filters effectively joining billion of documents. Is there a way to enforce the filter order, e.g. rewrite the query to make sure the term filters are executed before the parent-child join madness?

The index refreshes every 120 seconds.

The query:

GET /user-v5/like/_search
{
   "query": {
      "bool": {
         "filter": [
            {
               "terms": {
                  "post_id": [
                     1489831183823275924,
                     1489206393580157727
                  ]
               }
            },
            {
               "term": {
                  "user_id": 587771206
               }
            },
            {
               "has_parent": {
                  "query": {
                     "bool": {
                        "filter": [
                           {
                              "term": {
                                 "calculated": true
                              }
                           }
                        ]
                     }
                  },
                  "score_mode": "none",
                  "parent_type": "user"
               }
            }
         ]
      }
   },
   "from": 0,
   "aggs": {
      "per_post": {
         "terms": {
            "field": "post_id",
            "size": 5
         }
      }
   },
   "size": 0
}

has_parent needs the global ordinals to run at all, iirc. So it doesn't matter how selective the term filters are unless they eliminate all the documents.

Hm, so, the has_parent query is always joining all children with all parents, regardless of reducing the children beforehand from a few billion to just 85 thousand documents?

The _parent global ordinals are build eagerly on each refresh, so there should be no rebuild at query time, besides why are just a few random shards 400 up to 3000 times slower?

99% of the time is spent in GlobalOrdinalsQuery.build_scorer, is this just a misleading naming, or is it really scoring there? The query is used in a filter context and with score_mode set to none, so there shouldn't be any scoring happening, should there?

We have just 4 threads each sending around 0,5 of those queries per second to our cluster and they consume one third of all CPU resources available ...

Okay, I found a workaround that at least mitigates the slow queries, although the load on the cluster has increased thanks to the increased throughput :cold_sweat:

I added a has_child query with all bool conditions from the outer query, so in case a shard decides to use the has_parent query first, the has_child query inside the has_parent query seems to limit the join carnage.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.