Filtering collection in aggregation based on document property value

Hello all,

I am kind of stuck in a special case.
My document looks like below.
{
"_source" : {
"name" : "SPIM",
"spDetails" : {
"isActive" : true
},
"id" : "251561",
"categories" : [
"FM",
"SP"
]
}
},
{
"_source" : {
"name" : "WPC",
"spDetails" : {
"isActive" : false
},
"id" : "112283",
"categories" : [
"SP",
"FM"
]
}
},
{
"_source" : {
"name" : "AVG",
"spDetails" : {
"isActive" : false
},
"id" : "184612",
"categories" : [
"SP",
"FM"
]
}
},
{
"_source" : {
"name" : "ACP",
"spDetails" : {
"isActive" : false
},
"id" : "339335",
"categories" : [
"FM",
"SP"
]
}
},
{
"_source" : {
"name" : "GIC",
"spDetails" : {
"isActive" : false
},
"id" : "390040",
"categories" : [
"FM",
"SP"
]
}
}

I want to aggregate categories only if "spDetails.isActive" is true.
Thanks in advance.

Hi Mahendra,

What kind of aggregation are you looking to perform? Regardless, you can include a query and an aggregation in a single request to the Search API, which sounds like what you might be after. The resulting aggregation will only compute across search results that match the query criteria.
For example,

GET my_index/_search
{
  "query": {
    "term": {
      "spDetails.isActive": true
    }
  },
  "aggs": {
    "top_categories": {
      "terms": {
        "field": "categories"
      }
    }
  }
}

Hello @Scrilling ,

Thanks for your reply.
Let's assume my data is below.

{
    "_source" : {
        "name" : "WPC",
        "spDetails" : {
            "isActive" : false
         },
       "id" : "112283",
       "categories" : [
           "SP",
           "FM"
        ]
}
},

I want aggregation on categories but SP should not considered from categories if spDetails.isActive is false.

From above data categories collection should contain only "FM".
This is just example we have several categories on each record.

Unfortunately I can't put query on spDetails.isActive field. Query is performed on other fields, which is not included in this simple data.

Hi Mahendra,

If SP is the only category with a need for a filter, you could split your aggregation into two buckets for spDetails.isActive: true and spDetails.isActive: false using a filter or terms agg. From there, you could aggregate categories in each bucket and use something like a bucket selector agg for the false bucket to filter out SP results.

This obviously gets unwieldy pretty quickly if you have multiple isActive filters for each category. In that case, I would consider if it would make sense to model your categories as a nested field with category and isActive properties. With this structure you could aggregate across all of your categories within the context of a nested aggregation that filters on categories.isActive. Nested queries can be computationally expensive, so it ultimately comes down to a cost/benefit analysis.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.