Aggregation return data that do not match query

I am trying to do aggregation on documents which contains categories field. Categories is an array of strings. Sample document:

{
  "_index": "test-v11",
  "_type": "_doc",
  "_id": "954961",
  "_version": 4,
  "_score": 1,
  "_source": {
    "id": 954961,
    "categories": [
      "Patent",
      "Trademark"
    ]
  },
  "fields": {
    "id": [
      "954961"
    ],
    "categories": [
      "Patent",
      "Trademark"
    ],
    "categories.keyword": [
      "Patent",
      "Trademark"
    ]
  }
}

When I try to aggregate items by categories in the result I see categories that do not match query. My request:

POST _search
{
   "aggs":{
      "termBucketAgg":{
         "terms":{
            "field":"categories.keyword",
            "shard_size":65,
            "size":10
         }
      }
   },
   "query":{
      "bool":{
         "must":[
            {
               "query_string":{
                  "fields":[
                     "categories"
                  ],
                  "query":"*trade*"
               }
            }
         ]
      }
   },
   "size":0
}

Response:

"aggregations" : {
    "termBucketAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 806,
      "buckets" : [
        {
          "key" : "Patent",
          "doc_count" : 5436
        },
        {
          "key" : "Trademark",
          "doc_count" : 535
        }
		(...)
      ]
    }
  }

"Patent" do not match here, but probably I got it in the resposne because I have a document that have both categories. Any idea how to got only categories that match the query?

Welcome!

The way it works is that way:

  • The query selects the documents which matches
  • Then the aggregation aggregates ALL the values for ALL the documents which matched.

The aggregation does not filter the terms based on the query...

You can look at this may be: Terms aggregation | Elasticsearch Guide [8.8] | Elastic

This solve my issue (include part):

POST _search
{
   "aggs":{
      "termBucketAgg":{
         "terms":{
            "field":"categories.keyword",
            "shard_size":65,
            "size":10,
            "include": ".*[cC][oO][nN].*"
         }
      }
   },
   "query":{
      "bool":{
         "must":[
            {
               "query_string":{
                  "fields":[
                     "categories"
                  ],
                  "query":"*con*"
               }
            }
         ]
      }
   },
   "size":0
}

Thank you @dadoonet that helps me to find a solution.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.