FiltersAggregation does not use query filter and/or parent aggregation filter

Hi,

I'm not sure if the issue was already reported but I couldn't find anything.

It appears that the FiltersAggregation does not leverage the parent's aggregation filter and/or the query filter.

Using:

PUT /logs/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "body" : "warning" }
{ "index" : { "_id" : 2 } }
{ "body" : "error" }
{ "index" : { "_id" : 3 } }
{ "body" : "warning" }

With the query (notice the query and top_filter that matches nothing)

{
  "profile": true,
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "filter": [
              {"term": {"body": {"value": "info"}}}
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "top_filter": {
      "filter": {
        "bool": {
          "filter": [
            {"term": {"body": {"value": "info"}}}
          ]
        }
      },
      "aggregations": {
        "messages": {
          "filters": {
            "filters": {
              "errors": {
                "bool": {
                  "filter": [
                    {"term": {"body": "error"}}
                  ]
                }
              },
              "warnings": {
                "bool": {
                  "filter": [
                    {"term": {"body": "warning"}}
                  ]
                }
              }
            }
          }
        }
      }
    }
  }
}

I noticed that the filters will execute the query on the whole index and not use the top_filter and/or the query to narrow the matching documents to aggregate.

Using profiling we can see in searches sent to each shards:

      "shards": [
         {
            "id": "[njy_8R0HRWmOcQBnqu4iFw][logs][0]",
            "searches": [
               {
                  "query": [
                     {
                        "type": "BoostQuery",
                        "description": "(ConstantScore(body:info))^0.0",
                        "time_in_nanos": 1282500,
                        "breakdown": {
                        },
                        "children": [
                        ]
                     },
                     {
                        "type": "BoostQuery",
                        "description": "(ConstantScore(body:error))^0.0",
                        "time_in_nanos": 1118800,
                        "breakdown": {
                        },
                        "children": [
                        ]
                     },

This is a huge performance issue for me because in my use case the filters are really costly and they are performed on the whole index instead of on the small subset returned by the query.

To give you a concrete exemple:

  • I have an index per month and uses an alias as a front.
  • I perform a range query based the date
  • The query may matches nothing on some of the indices but can still takes 3-4min due to the filters being executed on the whole index nonetheless

Am I doing something wrong?

Is there anything I can do besides copying the query part in each clauses of filters?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.